{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#### PRE-RELEASE REVIEW: O'Reilly Intermediate Python Video Series\n", "#### Topic: Consuming JSON Data from a Web Service\n", "Source located in `nbsource/consuming-json-data-from-a-web-service.ipynb`. Automatically rendered from `UNKNOWN` on 2014-03-21 at 10:21:36.572700.\n", "#### AUTOMATICALLY GENERATED TEST NOTEBOOK - CHANGES WILL BE LOST\n", "#### To TEST this Notebook
1. select Cell | Run All, or step through with Shift/Enter
2. Notebook turns pink when testing starts
3. It turns white when the it runs to completion.
4. Please check the executed notebook for clarity and correctness.
Please [raise an issue](https://github.com/holdenweb/intermediate-python/issues) about anything you don't understand - by all means attach a saved copy of the notebook to add explanations or questions. Also please let us know about stuff you want to see covered under any heading in the outline.\n", "#### Your comments on the content are also welcome, particularly when [reported as issues](https://github.com/holdenweb/intermediate-python/issues)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This document was created on 2014-03-09 at 15:04:58.964462." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# ||| WORKING COPY ||| Consuming JSON Data from a Web Service" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The [JSON data format](http://www.json.org/) has become a popular way to interchange data. It is more concise than XML, easier for humans to read should they have to, and supported by the ECMA-405 standard. Python comes with a `json` module that operates very like the `pickle` module does. You can convert between JSON strings and Python data structures using the `json.dumps()` method that takes a Python structure argument and returns a JSON string. The converse process is performed by the `json.loads()` method,which takes a JSON string as its argument and returns the corresponding Python data structure.\n", "\n", "JSON is a useful standard for data interchange of the following (Python) types of data:\n", "\n", " * Lists (JSON arrays)\n", " * Dicts (JSON objects)\n", " * Strings (JSON strings)\n", " * Integers (JSON numbers)\n", " * Floating-point values (JSON numbers)\n", " * `True` and `False` (JSON `true` and `false`)\n", " * `None` (JSON `null`)\n", "\n", "Attempts to encode other Python objects will lead to failure unless the programmer extends the JSON encoding.\n", "\n", "Let's start by writing a function that round-trips Python structures through the JSON notation." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import json\n", "def json_round_trip(structure):\n", " return json.loads(json.dumps(structure))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking at the JSON output from `dumps()` shows you that the structure is pretty like a Python data structure made up of lists, dicts and simpler values like lists and strings. Many (but not all!) JSON data structures can be handed to the Python `eval()` function to return the same structure returned by `loads()`. It doesn't do to count on it, though—some values are presented differently, notably `true`, `false` and `null` are the JSON equivalents of `True`, `False` and `None` respectively." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1, \"string\", [\"l\", \"i\", \"s\", \"t\"], {\"various\": \"string keys\", \"dict\": \"with\"}]\n", "[1, 'string', ['l', 'i', 's', 't'], {'dict': 'with', 'various': 'string keys'}]\n", "Round trip succeeded\n" ] } ], "source": [ "struc1 = [\n", " 1,\n", " \"string\",\n", " [\"l\", \"i\", \"s\", \"t\"],\n", " {\n", " \"dict\": \"with\",\n", " \"various\": \"string keys\"\n", " }\n", "]\n", "\n", "json_string = json.dumps(struc1)\n", "print(json_string)\n", "print(eval(json_string))\n", "if struc1 == json_round_trip(struc1):\n", " print(\"Round trip succeeded\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Downloading a json file from the web and parsing it." ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "('{\\n \"metadata\": {\\n \"name\": \"\"\\n },\\n \"nbfo',\n", " [u'nbformat', u'nbformat_minor', u'worksheets', u'metadata'],\n", " 4,\n", " 4265)" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# See https://docs.python.org/2/howto/urllib2.html\n", "import urllib2\n", "url = \"https://raw.githubusercontent.com/DevTeam-TheOpenBastion/int-py-notes/master/nbsource/list-dict-and-set-comprehensions.ipynb\"\n", "jsonfile= urllib2.urlopen(url)\n", "jsonfile = jsonfile.read() # The jsonfile as a Python string\n", "json_as_python_object = json.loads(jsonfile) # The josnfile transformed into a Python dict\n", "\n", "# test \n", "jsonfile[:40],json_as_python_object.keys(),len(json_as_python_object), len(jsonfile) \n", "# show the keys\n", "#json_as_python_object.keys()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "!!\n", "When you pass a type that can't be encoded an exception is raised. You can see JSON does not handle complex numbers, which would therefore have to be stored as two separate numbers, or as objects with a real and a complex field. \n", "\n", " json.dumps(3+4j)\n", " ---------------------------------------------------------------------------\n", " TypeError Traceback (most recent call last)\n", " in ()\n", " ...\n", " ... (lots of traceback)\n", " ...\n", " TypeError: (3+4j) is not JSON serializable\n", "\n", "There are ways of extending the range of data types that can be handled in JSON, which we will discuss later in this lesson." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `json.load()` and `json.dump()` functions operate similarly, but instead of producing a string output they take an open file as an argument and pass the data from and to the file, respectively. These functions have complex signatures and share many arguments with each other and the encoders and decoders discussed below. Consult the documentation for full details. Here is the full signature of the `dumps()` function.\n", "\n", " json.dump(obj, fp, skipkeys=False, ensure_ascii=True,\n", " check_circular=True, allow_nan=True, cls=None,\n", " indent=None, separators=None, default=None,\n", " sort_keys=False, **kw)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Most of these keyword arguments can safely be omitted most of the time, and are common to other JSON calls. The exception to this is the `allow_nan` argument (True by default) that permits certain non-standard values to be encoded. You can think of it as an extension of the standard's floating-point range. I can do no better than quote the documentation:\n", "\n", " If `allow_nan` is false, then it will be a `ValueError` to\n", " serialize out of range `float` values (`nan`, `inf`, `-inf`) in\n", " strict compliance of the JSON specification, instead of using the\n", " JavaScript equivalents (`NaN`, `Infinity`, `-Infinity`).\n", "\n", "Once created, encoders are used to turn non-JSON objects into representable (JSON) data strings. Correspondingly, a decoder will take a JSON string and turn it into an object. \n", "\n", "you can call an encoder's `decode` method to turn data structure into a JSON string. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For more complex encoding and decoding you can use the [encoders and decoders](http://docs.python.org/3.3/library/json.html#encoders-and-decoders) (`json.JSONEncoder` and `json.JSONDecoder`)classes. These allow you to establish encodings that can be used to handle non-JSON types, for example. The Encoder's full signature is complex just as the module's conversion functions are." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('Python:', 'nan')\n" ] } ], "source": [ "struc2 = float(\"NaN\")\n", "print(\"Python:\", str(struc2))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "('JSON:', 'NaN')\n" ] } ], "source": [ "std_encoder = json.JSONEncoder()\n", "print(\"JSON:\", std_encoder.encode(struc2))" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "collapsed": false }, "outputs": [ { "ename": "ValueError", "evalue": "Out of range float values are not JSON compliant", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0mstrict_encoder\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mjson\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mJSONEncoder\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mallow_nan\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mstrict_encoder\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mencode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mstruc2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Applications/Canopy.app/appdata/canopy-1.1.0.1371.macosx-x86_64/Canopy.app/Contents/lib/python2.7/json/encoder.pyc\u001b[0m in \u001b[0;36mencode\u001b[0;34m(self, o)\u001b[0m\n\u001b[1;32m 199\u001b[0m \u001b[0;31m# exceptions aren't as detailed. The list call should be roughly\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 200\u001b[0m \u001b[0;31m# equivalent to the PySequence_Fast that ''.join() would do.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 201\u001b[0;31m \u001b[0mchunks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0miterencode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mo\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0m_one_shot\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 202\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mchunks\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mlist\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtuple\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 203\u001b[0m \u001b[0mchunks\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mchunks\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Applications/Canopy.app/appdata/canopy-1.1.0.1371.macosx-x86_64/Canopy.app/Contents/lib/python2.7/json/encoder.pyc\u001b[0m in \u001b[0;36miterencode\u001b[0;34m(self, o, _one_shot)\u001b[0m\n\u001b[1;32m 262\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkey_separator\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mitem_separator\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msort_keys\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 263\u001b[0m self.skipkeys, _one_shot)\n\u001b[0;32m--> 264\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0m_iterencode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mo\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 265\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 266\u001b[0m def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,\n", "\u001b[0;31mValueError\u001b[0m: Out of range float values are not JSON compliant" ] } ], "source": [ "strict_encoder = json.JSONEncoder(allow_nan=False)\n", "strict_encoder.encode(struc2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `JSONDecoder` and `JSONencoder` classes can be extended by subclassing." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "ename": "IndentationError", "evalue": "expected an indented block (, line 8)", "output_type": "error", "traceback": [ "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m8\u001b[0m\n\u001b[0;31m \u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mIndentationError\u001b[0m\u001b[0;31m:\u001b[0m expected an indented block\n" ] } ], "source": [ "class nonJson:\n", " \"\"\"Demonstrates how an object that cannot be represented.\"\"\"\n", " def __init__(self, one, two):\n", " self.one = one\n", " self.two = two\n", " @classmethod\n", " def decode(cls, json_string):\n", " " ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "ename": "NameError", "evalue": "name 'strict_decoder' is not defined", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mstrict_decoder\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mNameError\u001b[0m: name 'strict_decoder' is not defined" ] } ], "source": [ "strict_decoder" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.11" } }, "nbformat": 4, "nbformat_minor": 0 }