{ "metadata": { "name": "", "signature": "sha256:8589579fe1aeca45791fed279395b0b53061c1aa7cfdeac4d9d934919f29e81d" }, "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "cell_type": "markdown", "metadata": [], "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 4.10. Manipulating large arrays with HDF5 and PyTables" ] }, { "cell_type": "code", "collapsed": false, "input": [ "import numpy as np\n", "import tables as tb" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating an HDF5 file" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's create a new empty HDF5 file." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = tb.open_file('myfile.h5', 'w')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a new top-level group named \"experiment1\"." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.create_group('/', 'experiment1')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's also add some metadata to this group." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.set_node_attr('/experiment1', 'date', '2014-09-01')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this group, we create a 1000*1000 array named \"array1\"." ] }, { "cell_type": "code", "collapsed": false, "input": [ "x = np.random.rand(1000, 1000)\n", "f.create_array('/experiment1', 'array1', x)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we need to close the file to commit the changes on disk." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.close()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reading a HDF5 file" ] }, { "cell_type": "code", "collapsed": false, "input": [ "f = tb.open_file('myfile.h5', 'r')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can retrieve an attribute by giving the group path and the attribute name." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.get_node_attr('/experiment1', 'date')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can access any item in the file using attributes. IPython's tab completion is incredibly useful in this respect when exploring a file interactively." ] }, { "cell_type": "code", "collapsed": false, "input": [ "y = f.root.experiment1.array1\n", "type(y)" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The array can be used as a NumPy array, but an important distinction is that it is stored on disk instead of system memory. Performing a computation on this array triggers a preliminary loading of the array in memory, so that it is more efficient to only access views on this array." ] }, { "cell_type": "code", "collapsed": false, "input": [ "np.array_equal(x[0,:], y[0,:])" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is also possible to get a node from its absolute path, which is useful when this path is only known at runtime." ] }, { "cell_type": "code", "collapsed": false, "input": [ "f.get_node('/experiment1/array1')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "code", "collapsed": false, "input": [ "f.close()" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Clean-up." ] }, { "cell_type": "code", "collapsed": false, "input": [ "import os\n", "os.remove('myfile.h5')" ], "language": "python", "metadata": {}, "outputs": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n", "\n", "> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ] } ], "metadata": {} } ] }