{ "metadata": { "name": "", "signature": "sha256:2ec741daefdcbdc8c7f2b2a5e3a8523ae84dbc5c7556d678eb085df29ef4aff8" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": [], "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.8. Making efficient selections in arrays with NumPy" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "id = lambda x: x.__array_interface__['data'][0]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a large array." ] }, { "cell_type": "code", "collapsed": false, "input": [ "n, d = 100000, 100" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "a = np.random.random_sample((n, d)); aid = id(a)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Array views and fancy indexing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We take a selection using two different methods: with a view and with fancy indexing." ] }, { "cell_type": "code", "collapsed": false, "input": [ "b1 = a[::10]\n", "b2 = a[np.arange(0, n, 10)]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array_equal(b1, b2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The view refers to the original data buffer, whereas fancy indexing yields a copy." ] }, { "cell_type": "code", "collapsed": false, "input": [ "id(b1) == aid, id(b2) == aid" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fancy indexing is several orders of magnitude slower as it involves copying a large array. Fancy indexing is more general as it allows to select any portion of an array (using any list of indices), not just a strided selection." ] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit a[::10]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit a[np.arange(0, n, 10)]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternatives to fancy indexing: list of indices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Given a list of indices, there are two ways of selecting the corresponding sub-array: fancy indexing, or the np.take function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "i = np.arange(0, n, 10)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "b1 = a[i]\n", "b2 = np.take(a, i, axis=0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array_equal(b1, b2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit a[i]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit np.take(a, i, axis=0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Using np.take instead of fancy indexing is faster." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: Performance of fancy indexing has been improved in recent versions of NumPy; this trick is especially useful on older versions of NumPy." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Alternatives to fancy indexing: mask of booleans" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a mask of booleans, where each value indicates whether the corresponding row needs to be selected in x." ] }, { "cell_type": "code", "collapsed": false, "input": [ "i = np.random.random_sample(n) < .5" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The selection can be made using fancy indexing or the np.compress function." ] }, { "cell_type": "code", "collapsed": false, "input": [ "b1 = a[i]\n", "b2 = np.compress(i, a, axis=0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array_equal(b1, b2)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit a[i]" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "%timeit np.compress(i, a, axis=0)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once again, the alternative method to fancy indexing is faster." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": {} } ] }