{ "metadata": { "name": "", "signature": "sha256:5286b45f32b884d0319ca69afc30f846b4ee9bec990305401e26537143c989ad" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Featured Recipe #3: Creating a route planner for road network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This is a featured recipe from the [**IPython Cookbook**](http://ipython-books.github.io/), the definitive guide to **high-performance scientific computing** and **data science** in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this featured recipe, we create a simple **GPS-like route planner** in Python. We retrieve California's road network data from the *United States Census Bureau* in order to find shortest paths and compute road itineraries between any two locations in California." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Ready" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this recipe, you need IPython, NumPy, Pandas, matplotlib." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You also need [NetworkX](http://networkx.github.io/) and [GDAL/OGR](http://www.gdal.org/ogr/). On Windows, you can find binary installers on [Chris Gohlke's webpage](http://www.lfd.uci.edu/~gohlke/pythonlibs/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> At the time of this writing, NetworkX's support of Shapefile doesn't seem to be compatible with Python 3.x. For this reason, this recipe has only been successfully tested with Python 2.x." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You also need [**Smopy**](https://github.com/rossant/smopy) and [Pillow](https://pillow.readthedocs.org/en/latest/) for displaying [OpenStreetMap maps](http://www.openstreetmap.org/):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!pip install smopy" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you need to download the [*Road* dataset](https://github.com/ipython-books/cookbook-data) on the book's website an extract it in the current folder. The data comes from the [United States Census Bureau website](http://www.census.gov/geo/maps-data/data/tiger.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to do it..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's import the packages." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx\n", "import numpy as np\n", "import pandas as pd\n", "import json\n", "import smopy\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "mpl.rcParams['figure.dpi'] = mpl.rcParams['savefig.dpi'] = 300" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. We load the data (a Shapefile dataset) with NetworkX. This dataset contains detailed information about the primary roads in California. NetworkX's `read_shp` function returns a graph, where each node is a geographical position, and each edge contains information about the road linking the two nodes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "g = nx.read_shp(\"data/tl_2013_06_prisecroads.shp\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. This graph is not necessarily connected, but we need a connected graph in order to compute shortest paths. Here, we take the largest connected subgraph using the `connected_component_subgraphs` function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sg = list(nx.connected_component_subgraphs(g.to_undirected()))[0]\n", "len(sg)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "464" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. We define two positions (with the latitude and longitude). We will find the shortest path between these two positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pos0 = (36.6026, -121.9026)\n", "pos1 = (34.0569, -118.2427)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Each edge in the graph contains information about the road, including a list of points along this road. We first create a function that returns this array of coordinates, for any edge in the graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_path(n0, n1):\n", " \"\"\"If n0 and n1 are connected nodes in the graph, this function\n", " return an array of point coordinates along the road linking\n", " these two nodes.\"\"\"\n", " return np.array(json.loads(sg[n0][n1]['Json'])['coordinates'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. We will notably use the road path to compute its length. We first need to define a function that computes the distance between any two points in geographical coordinates. This function has been found in [StackOverflow](http://stackoverflow.com/questions/8858838/need-help-calculating-geographical-distance)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "EARTH_R = 6372.8\n", "def geocalc(lat0, lon0, lat1, lon1):\n", " \"\"\"Return the distance (in km) between two points in \n", " geographical coordinates.\"\"\"\n", " lat0 = np.radians(lat0)\n", " lon0 = np.radians(lon0)\n", " lat1 = np.radians(lat1)\n", " lon1 = np.radians(lon1)\n", " dlon = lon0 - lon1\n", " y = np.sqrt(\n", " (np.cos(lat1) * np.sin(dlon)) ** 2\n", " + (np.cos(lat0) * np.sin(lat1) \n", " - np.sin(lat0) * np.cos(lat1) * np.cos(dlon)) ** 2)\n", " x = np.sin(lat0) * np.sin(lat1) + \\\n", " np.cos(lat0) * np.cos(lat1) * np.cos(dlon)\n", " c = np.arctan2(y, x)\n", " return EARTH_R * c" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Now, we define a function computing a path's length." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_path_length(path):\n", " return np.sum(geocalc(path[1:,0], path[1:,1],\n", " path[:-1,0], path[:-1,1]))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Now, we update our graph by computing the distance between any two connected nodes. We add this information in the `distance` attribute of the edges." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Compute the length of the road segments.\n", "for n0, n1 in sg.edges_iter():\n", " path = get_path(n0, n1)\n", " distance = get_path_length(path)\n", " sg.edge[n0][n1]['distance'] = distance" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. The last step before we can find the shortest path in the graph, is to find the two nodes in the graph that are closest to the two requested positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nodes = np.array(sg.nodes())\n", "# Get the closest nodes in the graph.\n", "pos0_i = np.argmin(np.sum((nodes[:,::-1] - pos0)**2, axis=1))\n", "pos1_i = np.argmin(np.sum((nodes[:,::-1] - pos1)**2, axis=1))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Now, we use NetworkX's `shortest_path` function to compute the shortest path between our two positions. We specify that the weight of every edge is the length of the road between them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Compute the shortest path.\n", "path = nx.shortest_path(sg, \n", " source=tuple(nodes[pos0_i]), \n", " target=tuple(nodes[pos1_i]),\n", " weight='distance')\n", "len(path)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "19" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. The itinerary has been computed. The `path` variable contains the list of edges that form the shortest path between our two positions. Now, we can get information about the itinerary with Pandas. The dataset has a few fields of interest, including the name and type (State, Interstate, etc.) of the roads." ] }, { "cell_type": "code", "collapsed": false, "input": [ "roads = pd.DataFrame([sg.edge[path[i]][path[i + 1]] \n", " for i in range(len(path) - 1)], \n", " columns=['FULLNAME', 'MTFCC', \n", " 'RTTYP', 'distance'])\n", "roads" ], "language": "python", "metadata": { "strip_output": [ 3, 3 ] }, "outputs": [ { "html": [ "
\n", " | FULLNAME | \n", "MTFCC | \n", "RTTYP | \n", "distance | \n", "
---|---|---|---|---|
0 | \n", "State Rte 1 | \n", "S1200 | \n", "S | \n", "100.658130 | \n", "
1 | \n", "State Rte 1 | \n", "S1200 | \n", "S | \n", "33.419556 | \n", "
2 | \n", "Cabrillo Hwy | \n", "S1200 | \n", "M | \n", "4.399051 | \n", "
3 | \n", "State Rte 1 | \n", "S1200 | \n", "S | \n", "12.400382 | \n", "
4 | \n", "Cabrillo Hwy | \n", "S1200 | \n", "M | \n", "36.693272 | \n", "
5 | \n", "Cabrillo Hwy | \n", "S1200 | \n", "M | \n", "0.017746 | \n", "
6 | \n", "Cabrillo Hwy | \n", "S1200 | \n", "M | \n", "0.439355 | \n", "
7 | \n", "Cabrillo Hwy | \n", "S1200 | \n", "M | \n", "0.130107 | \n", "
8 | \n", "State Hwy 1 | \n", "S1200 | \n", "S | \n", "0.007007 | \n", "
9 | \n", "el Camino Real | \n", "S1200 | \n", "M | \n", "5.774056 | \n", "
10 | \n", "el Camino Real | \n", "S1200 | \n", "M | \n", "0.507131 | \n", "
11 | \n", "el Camino Real | \n", "S1200 | \n", "M | \n", "33.550742 | \n", "
12 | \n", "US Hwy 101 | \n", "S1200 | \n", "U | \n", "140.786519 | \n", "
13 | \n", "US Hwy 101 | \n", "S1200 | \n", "U | \n", "75.852281 | \n", "
14 | \n", "Ventura Fwy | \n", "S1200 | \n", "M | \n", "49.045475 | \n", "
15 | \n", "Hollywood Fwy | \n", "S1200 | \n", "M | \n", "0.885826 | \n", "
16 | \n", "Hollywood Fwy | \n", "S1200 | \n", "M | \n", "14.087603 | \n", "
17 | \n", "Hollywood Fwy | \n", "S1200 | \n", "M | \n", "0.010107 | \n", "