{ "metadata": { "name": "", "signature": "sha256:0a6cc02c82fa600c5dd119642ec8c96c480d1f2bc6672edea08d1e200fb0ee39" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Chapter 2, example 3\n", "====================\n", "\n", "Here we illustrate various interactive computing features of IPython by loading the social data set with the `NetworkX` module." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Tab completion is highly useful in an interactive session to explore an object's attributes and methods. Here, we look for ways to open our files from the Facebook data." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.read" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "AttributeError", "evalue": "'module' object has no attribute 'read'", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mAttributeError\u001b[0m: 'module' object has no attribute 'read'" ] } ], "prompt_number": 3 }, { "cell_type": "code", "collapsed": false, "input": [ "cd fbdata" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "(bookmark:fbdata) -> /Users/admin/Documents/work/onderwijs/teaching/ISatWork/NoteBooks/minibook-code/chapter2/data/facebook\n", "/Users/admin/Documents/work/onderwijs/teaching/ISatWork/NoteBooks/minibook-code/chapter2/data/facebook\n" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `read_edgelist` method looks adapted here as the `.edges` files contain list of person identifiers. This method returns a graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "g = nx.read_edgelist('0.edges')" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's display the number of nodes and edges in the graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "len(g.nodes()), len(g.edges())" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 6, "text": [ "(333, 2519)" ] } ], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's try to compute the radius of the graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.radius(g)" ], "language": "python", "metadata": {}, "outputs": [ { "ename": "NetworkXError", "evalue": "Graph not connected: infinite path length", "output_type": "pyerr", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m\n\u001b[0;31mNetworkXError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mnx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mradius\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;32m/Users/admin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/networkx/algorithms/distance_measures.pyc\u001b[0m in \u001b[0;36mradius\u001b[0;34m(G, e)\u001b[0m\n\u001b[1;32m 141\u001b[0m \"\"\"\n\u001b[1;32m 142\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0me\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 143\u001b[0;31m \u001b[0me\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0meccentricity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mG\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 144\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mmin\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0me\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 145\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m/Users/admin/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/networkx/algorithms/distance_measures.pyc\u001b[0m in \u001b[0;36meccentricity\u001b[0;34m(G, v, sp)\u001b[0m\n\u001b[1;32m 61\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mL\u001b[0m \u001b[0;34m!=\u001b[0m \u001b[0morder\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 62\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m\"Graph not connected: infinite path length\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 63\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mnetworkx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mNetworkXError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 64\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 65\u001b[0m \u001b[0me\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mn\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmax\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlength\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mNetworkXError\u001b[0m: Graph not connected: infinite path length" ] } ], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The error comes from the fact that the graph is not connected, so that the radius is infinite. We can try to obtain a connected component instead. Tab completion can help us finding the right function for that." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.connected" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 8, "text": [ "" ] } ], "prompt_number": 8 }, { "cell_type": "code", "collapsed": false, "input": [ "sg = nx.connected_component_subgraphs(g)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "code", "collapsed": false, "input": [ "[len(s) for s in sg]" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 10, "text": [ "[324, 3, 2, 2, 2]" ] } ], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take the largest connected component." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sg = sg[0]" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can compute the radius and diameter of the graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.radius(sg), nx.diameter(sg)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ "(6, 11)" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Appendind `?` to any object in IPython gives information about it." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.eccentricity?" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `%pdef`, `%pdoc` and `%psource` magic commands give different pieces of information about objects: the definition, the docstring, and the source code." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%pdef nx.eccentricity" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ " \u001b[0mnx\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0meccentricity\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mG\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mv\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msp\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mNone\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", " " ] } ], "prompt_number": 14 }, { "cell_type": "code", "collapsed": false, "input": [ "%pdoc nx.eccentricity" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "code", "collapsed": false, "input": [ "%psource nx.eccentricity" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use the `%timeit` magic command to evaluate the time an instruction takes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit nx.center(sg)" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "1 loops, best of 3: 238 ms per loop\n" ] } ], "prompt_number": 17 }, { "cell_type": "code", "collapsed": false, "input": [ "nx.center(sg)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 18, "text": [ "[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']" ] } ], "prompt_number": 18 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we write our own, unoptimized function that computes a graph's center. Here is the code contained in `center.py`:\n", "\n", " import networkx as nx\n", " g = nx.read_edgelist('0.edges')\n", " sg = nx.connected_component_subgraphs(g)[0]\n", " center = [node for node in sg.nodes() if nx.eccentricity(sg, node) == nx.radius(sg)]\n", " print(center)\n", "\n", "We can benchmark and profile it to find hotspots that should be optimized." ] }, { "cell_type": "code", "collapsed": false, "input": [ "cd ../.." ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "/Users/admin/Documents/work/onderwijs/teaching/ISatWork/NoteBooks/minibook-code/chapter2\n" ] } ], "prompt_number": 19 }, { "cell_type": "code", "collapsed": false, "input": [ "run -t center.py" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']\n", "\n", "IPython CPU timings (estimated):\n", " User : 85.71 s.\n", " System : 1.43 s.\n", "Wall time: 88.25 s.\n" ] } ], "prompt_number": 20 }, { "cell_type": "code", "collapsed": false, "input": [ "run -p center.py" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']\n", " " ] } ], "prompt_number": 21 }, { "cell_type": "markdown", "metadata": {}, "source": [ "We repeatedly call the exact same functions which explain why this function is so slow. Let's optimize it by caching the outputs of these functions. Here is the code contained in the file `center2.py`:\n", "\n", " import networkx as nx\n", " g = nx.read_edgelist('0.edges')\n", " sg = nx.connected_component_subgraphs(g)[0]\n", " # we compute the eccentricity once, for all nodes\n", " ecc = nx.eccentricity(sg)\n", " # we compute the radius once\n", " r = nx.radius(sg)\n", " center = [node for node in sg.nodes() if ecc[node] == r]\n", " print(center)\n" ] }, { "cell_type": "code", "collapsed": false, "input": [ "run -t center2.py" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [ "[u'51', u'190', u'83', u'307', u'175', u'237', u'277', u'124']\n", "\n", "IPython CPU timings (estimated):\n", " User : 0.56 s.\n", " System : 0.01 s.\n", "Wall time: 0.58 s.\n" ] } ], "prompt_number": 22 }, { "cell_type": "markdown", "metadata": {}, "source": [ "NetworkX allows to plot graphs with the help of Matplotlib. We use several options to make the graph nicer. You can find the [full documentation here](http://networkx.github.com/documentation/latest/reference/generated/networkx.drawing.nx_pylab.draw_networkx.html#networkx.drawing.nx_pylab.draw_networkx)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nx.draw_networkx(sg, node_size=15, edge_color='y', with_labels=False, alpha=.4, linewidths=0)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 24 }, { "cell_type": "code", "collapsed": false, "input": [], "language": "python", "metadata": {}, "outputs": [] } ], "metadata": {} } ] }