{ "metadata": { "name": "", "signature": "sha256:5286b45f32b884d0319ca69afc30f846b4ee9bec990305401e26537143c989ad" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Featured Recipe #3: Creating a route planner for road network" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This is a featured recipe from the [**IPython Cookbook**](http://ipython-books.github.io/), the definitive guide to **high-performance scientific computing** and **data science** in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this featured recipe, we create a simple **GPS-like route planner** in Python. We retrieve California's road network data from the *United States Census Bureau* in order to find shortest paths and compute road itineraries between any two locations in California." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting Ready" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this recipe, you need IPython, NumPy, Pandas, matplotlib." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You also need [NetworkX](http://networkx.github.io/) and [GDAL/OGR](http://www.gdal.org/ogr/). On Windows, you can find binary installers on [Chris Gohlke's webpage](http://www.lfd.uci.edu/~gohlke/pythonlibs/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> At the time of this writing, NetworkX's support of Shapefile doesn't seem to be compatible with Python 3.x. For this reason, this recipe has only been successfully tested with Python 2.x." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You also need [**Smopy**](https://github.com/rossant/smopy) and [Pillow](https://pillow.readthedocs.org/en/latest/) for displaying [OpenStreetMap maps](http://www.openstreetmap.org/):" ] }, { "cell_type": "code", "collapsed": false, "input": [ "!pip install smopy" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 1 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, you need to download the [*Road* dataset](https://github.com/ipython-books/cookbook-data) on the book's website an extract it in the current folder. The data comes from the [United States Census Bureau website](http://www.census.gov/geo/maps-data/data/tiger.html)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to do it..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "1. Let's import the packages." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import networkx as nx\n", "import numpy as np\n", "import pandas as pd\n", "import json\n", "import smopy\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "import matplotlib as mpl\n", "mpl.rcParams['figure.dpi'] = mpl.rcParams['savefig.dpi'] = 300" ], "language": "python", "metadata": {}, "outputs": [ { "output_type": "stream", "stream": "stdout", "text": [] } ], "prompt_number": 2 }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. We load the data (a Shapefile dataset) with NetworkX. This dataset contains detailed information about the primary roads in California. NetworkX's `read_shp` function returns a graph, where each node is a geographical position, and each edge contains information about the road linking the two nodes." ] }, { "cell_type": "code", "collapsed": false, "input": [ "g = nx.read_shp(\"data/tl_2013_06_prisecroads.shp\")" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 3 }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. This graph is not necessarily connected, but we need a connected graph in order to compute shortest paths. Here, we take the largest connected subgraph using the `connected_component_subgraphs` function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "sg = list(nx.connected_component_subgraphs(g.to_undirected()))[0]\n", "len(sg)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 4, "text": [ "464" ] } ], "prompt_number": 4 }, { "cell_type": "markdown", "metadata": {}, "source": [ "4. We define two positions (with the latitude and longitude). We will find the shortest path between these two positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "pos0 = (36.6026, -121.9026)\n", "pos1 = (34.0569, -118.2427)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 5 }, { "cell_type": "markdown", "metadata": {}, "source": [ "5. Each edge in the graph contains information about the road, including a list of points along this road. We first create a function that returns this array of coordinates, for any edge in the graph." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_path(n0, n1):\n", " \"\"\"If n0 and n1 are connected nodes in the graph, this function\n", " return an array of point coordinates along the road linking\n", " these two nodes.\"\"\"\n", " return np.array(json.loads(sg[n0][n1]['Json'])['coordinates'])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 6 }, { "cell_type": "markdown", "metadata": {}, "source": [ "6. We will notably use the road path to compute its length. We first need to define a function that computes the distance between any two points in geographical coordinates. This function has been found in [StackOverflow](http://stackoverflow.com/questions/8858838/need-help-calculating-geographical-distance)." ] }, { "cell_type": "code", "collapsed": false, "input": [ "EARTH_R = 6372.8\n", "def geocalc(lat0, lon0, lat1, lon1):\n", " \"\"\"Return the distance (in km) between two points in \n", " geographical coordinates.\"\"\"\n", " lat0 = np.radians(lat0)\n", " lon0 = np.radians(lon0)\n", " lat1 = np.radians(lat1)\n", " lon1 = np.radians(lon1)\n", " dlon = lon0 - lon1\n", " y = np.sqrt(\n", " (np.cos(lat1) * np.sin(dlon)) ** 2\n", " + (np.cos(lat0) * np.sin(lat1) \n", " - np.sin(lat0) * np.cos(lat1) * np.cos(dlon)) ** 2)\n", " x = np.sin(lat0) * np.sin(lat1) + \\\n", " np.cos(lat0) * np.cos(lat1) * np.cos(dlon)\n", " c = np.arctan2(y, x)\n", " return EARTH_R * c" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 7 }, { "cell_type": "markdown", "metadata": {}, "source": [ "7. Now, we define a function computing a path's length." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_path_length(path):\n", " return np.sum(geocalc(path[1:,0], path[1:,1],\n", " path[:-1,0], path[:-1,1]))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 8 }, { "cell_type": "markdown", "metadata": {}, "source": [ "8. Now, we update our graph by computing the distance between any two connected nodes. We add this information in the `distance` attribute of the edges." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Compute the length of the road segments.\n", "for n0, n1 in sg.edges_iter():\n", " path = get_path(n0, n1)\n", " distance = get_path_length(path)\n", " sg.edge[n0][n1]['distance'] = distance" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 9 }, { "cell_type": "markdown", "metadata": {}, "source": [ "9. The last step before we can find the shortest path in the graph, is to find the two nodes in the graph that are closest to the two requested positions." ] }, { "cell_type": "code", "collapsed": false, "input": [ "nodes = np.array(sg.nodes())\n", "# Get the closest nodes in the graph.\n", "pos0_i = np.argmin(np.sum((nodes[:,::-1] - pos0)**2, axis=1))\n", "pos1_i = np.argmin(np.sum((nodes[:,::-1] - pos1)**2, axis=1))" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 10 }, { "cell_type": "markdown", "metadata": {}, "source": [ "10. Now, we use NetworkX's `shortest_path` function to compute the shortest path between our two positions. We specify that the weight of every edge is the length of the road between them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "# Compute the shortest path.\n", "path = nx.shortest_path(sg, \n", " source=tuple(nodes[pos0_i]), \n", " target=tuple(nodes[pos1_i]),\n", " weight='distance')\n", "len(path)" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 11, "text": [ "19" ] } ], "prompt_number": 11 }, { "cell_type": "markdown", "metadata": {}, "source": [ "11. The itinerary has been computed. The `path` variable contains the list of edges that form the shortest path between our two positions. Now, we can get information about the itinerary with Pandas. The dataset has a few fields of interest, including the name and type (State, Interstate, etc.) of the roads." ] }, { "cell_type": "code", "collapsed": false, "input": [ "roads = pd.DataFrame([sg.edge[path[i]][path[i + 1]] \n", " for i in range(len(path) - 1)], \n", " columns=['FULLNAME', 'MTFCC', \n", " 'RTTYP', 'distance'])\n", "roads" ], "language": "python", "metadata": { "strip_output": [ 3, 3 ] }, "outputs": [ { "html": [ "
\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FULLNAMEMTFCCRTTYPdistance
0 State Rte 1 S1200 S 100.658130
1 State Rte 1 S1200 S 33.419556
2 Cabrillo Hwy S1200 M 4.399051
3 State Rte 1 S1200 S 12.400382
4 Cabrillo Hwy S1200 M 36.693272
5 Cabrillo Hwy S1200 M 0.017746
6 Cabrillo Hwy S1200 M 0.439355
7 Cabrillo Hwy S1200 M 0.130107
8 State Hwy 1 S1200 S 0.007007
9 el Camino Real S1200 M 5.774056
10 el Camino Real S1200 M 0.507131
11 el Camino Real S1200 M 33.550742
12 US Hwy 101 S1200 U 140.786519
13 US Hwy 101 S1200 U 75.852281
14 Ventura Fwy S1200 M 49.045475
15 Hollywood Fwy S1200 M 0.885826
16 Hollywood Fwy S1200 M 14.087603
17 Hollywood Fwy S1200 M 0.010107
\n", "
" ], "metadata": {}, "output_type": "pyout", "prompt_number": 12, "text": [ " FULLNAME MTFCC RTTYP distance\n", "0 State Rte 1 S1200 S 100.658130\n", "1 State Rte 1 S1200 S 33.419556\n", "2 Cabrillo Hwy S1200 M 4.399051\n", "3 State Rte 1 S1200 S 12.400382\n", "4 Cabrillo Hwy S1200 M 36.693272\n", "5 Cabrillo Hwy S1200 M 0.017746\n", "6 Cabrillo Hwy S1200 M 0.439355\n", "7 Cabrillo Hwy S1200 M 0.130107\n", "8 State Hwy 1 S1200 S 0.007007\n", "9 el Camino Real S1200 M 5.774056\n", "10 el Camino Real S1200 M 0.507131\n", "11 el Camino Real S1200 M 33.550742\n", "12 US Hwy 101 S1200 U 140.786519\n", "13 US Hwy 101 S1200 U 75.852281\n", "14 Ventura Fwy S1200 M 49.045475\n", "15 Hollywood Fwy S1200 M 0.885826\n", "16 Hollywood Fwy S1200 M 14.087603\n", "17 Hollywood Fwy S1200 M 0.010107" ] } ], "prompt_number": 12 }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is the total length of this itinerary." ] }, { "cell_type": "code", "collapsed": true, "input": [ "roads['distance'].sum()" ], "language": "python", "metadata": {}, "outputs": [ { "metadata": {}, "output_type": "pyout", "prompt_number": 13, "text": [ "508.66434555909808" ] } ], "prompt_number": 13 }, { "cell_type": "markdown", "metadata": {}, "source": [ "12. Finally, let display the itinerary on the map. We first retrieve the map with Smopy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "map = smopy.Map(pos0, pos1, z=7, margin=.1)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 14 }, { "cell_type": "markdown", "metadata": {}, "source": [ "13. Our path contains connected nodes in the graph. Every edge between two nodes is characterized by a list of points (constituting a part of the road). Therefore, we need to define a function that concatenates the positions along every edge in the path. A difficulty is that we need to concatenate the positions in the right order along our path. We choose the order based on the fact that the last point in an edge needs to be close to the first point in the next edge." ] }, { "cell_type": "code", "collapsed": false, "input": [ "def get_full_path(path):\n", " \"\"\"Return the positions along a path.\"\"\"\n", " p_list = []\n", " curp = None\n", " for i in range(len(path)-1):\n", " p = get_path(path[i], path[i+1])\n", " if curp is None:\n", " curp = p\n", " if np.sum((p[0]-curp)**2) > np.sum((p[-1]-curp)**2):\n", " p = p[::-1,:]\n", " p_list.append(p)\n", " curp = p[-1]\n", " return np.vstack(p_list)" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 15 }, { "cell_type": "markdown", "metadata": {}, "source": [ "14. We convert the path in pixels in order to display it on the Smopy map." ] }, { "cell_type": "code", "collapsed": false, "input": [ "linepath = get_full_path(path)\n", "x, y = map.to_pixels(linepath[:,1], linepath[:,0])" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 16 }, { "cell_type": "markdown", "metadata": {}, "source": [ "15. Finally, we display the map, with our two positions and the computed itinerary between them." ] }, { "cell_type": "code", "collapsed": false, "input": [ "plt.figure(figsize=(6,6));\n", "map.show_mpl();\n", "# Plot the itinerary.\n", "plt.plot(x, y, '-k', lw=1.5);\n", "# Mark our two positions.\n", "plt.plot(x[0], y[0], 'ob', ms=10);\n", "plt.plot(x[-1], y[-1], 'or', ms=10);" ], "language": "python", "metadata": {}, "outputs": [], "prompt_number": 17 }, { "cell_type": "markdown", "metadata": {}, "source": [ "![Route planner](images/road.jpg)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How it works..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We computed the shortest path with NetworkX's `shortest_path` function. Here, this function used **Dijkstra's algorithm**. This algorithm has a wide variety of applications, for example in network routing protocols.\n", "\n", "There are different ways to compute the geographical distance between two points. Here, we used a relatively precise formula: the **orthodromic distance** (also called **great-circle distance**), which assumes that the Earth is a perfect sphere. We could also have used a simpler formula since the distance between two successive points on a road is small." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## There's more..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can find more information about shortest path problems and Dijkstra's algorithm in the following references:\n", "\n", "* [Shortest path](http://en.wikipedia.org/wiki/Shortest_path_problem).\n", "* [Dijkstra's algorithm](http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are a few references about geographical distances:\n", "\n", "* [Geographical distance](http://en.wikipedia.org/wiki/Geographical_distance).\n", "* [Great circle](http://en.wikipedia.org/wiki/Great_Circle).\n", "* [Great circle distance](http://en.wikipedia.org/wiki/Great-circle_distance)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> This was a featured recipe from the [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014." ] } ], "metadata": {} } ] }