{ "nbformat": 3, "nbformat_minor": 0, "worksheets": [ { "cells": [ { "source": [ "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n" ], "cell_type": "markdown", "metadata": [] }, { "source": [ "# 5.7. Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA" ], "cell_type": "markdown", "metadata": {} }, { "source": [ "Let's import PyCUDA." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "import pycuda.driver as cuda\n", "import pycuda.autoinit\n", "from pycuda.compiler import SourceModule\n", "import numpy as np" ], "metadata": {} }, { "source": [ "Now, we initialize the NumPy array that will contain the fractal." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "size = 200\n", "iterations = 100\n", "col = np.empty((size, size), dtype=np.int32)" ], "metadata": {} }, { "source": [ "We allocate memory for this array on the GPU." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "col_gpu = cuda.mem_alloc(col.nbytes)" ], "metadata": {} }, { "source": [ "We write the CUDA kernel in a string. The mandelbrot function accepts the figure size, the number of iterations, and a pointer to the memory buffer as arguments. It updates the col buffer with the escape value in the fractal for each pixel." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "code = \"\"\"\n", "__global__ void mandelbrot(int size,\n", " int iterations,\n", " int *col)\n", "\n", "{\n", " // Get the row and column index of the current thread.\n", " int i = blockIdx.y * blockDim.y + threadIdx.y;\n", " int j = blockIdx.x * blockDim.x + threadIdx.x;\n", " int index = i * size + j;\n", " \n", " // Declare and initialize the variables.\n", " double cx, cy;\n", " double z0, z1, z0_tmp, z0_2, z1_2;\n", " cx = -2.0 + (double)j / size * 3;\n", " cy = -1.5 + (double)i / size * 3;\n", "\n", " // Main loop.\n", " z0 = z1 = 0.0;\n", " for (int n = 0; n < iterations; n++)\n", " {\n", " z0_2 = z0 * z0;\n", " z1_2 = z1 * z1;\n", " if (z0_2 + z1_2 <= 100)\n", " {\n", " // Need to update z0 and z1 in parallel.\n", " z0_tmp = z0_2 - z1_2 + cx;\n", " z1 = 2 * z0 * z1 + cy;\n", " z0 = z0_tmp;\n", " col[index] = n;\n", " }\n", " else break;\n", " }\n", "}\n", "\"\"\"" ], "metadata": {} }, { "source": [ "Now, we compile the CUDA program." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "prg = SourceModule(code)\n", "mandelbrot = prg.get_function(\"mandelbrot\")" ], "metadata": {} }, { "source": [ "We define the block size and the grid size, specifying how the threads will be parallelized with respect to the data." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "block_size = 10\n", "block = (block_size, block_size, 1)\n", "grid = (size // block_size, size // block_size, 1)" ], "metadata": {} }, { "source": [ "We call the compiled function." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", " block=block, grid=grid)" ], "metadata": {} }, { "source": [ "Once the function has completed, we copy the contents of the CUDA buffer back to the NumPy array col." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "cuda.memcpy_dtoh(col, col_gpu)" ], "metadata": {} }, { "source": [ "Let's display the fractal." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "import matplotlib.pyplot as plt\n", "%matplotlib inline\n", "plt.imshow(np.log(col), cmap=plt.cm.hot,);\n", "plt.xticks([]);\n", "plt.yticks([]);" ], "metadata": {} }, { "source": [ "Let's evaluate the time taken by this function." ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", "language": "python", "outputs": [], "collapsed": false, "input": [ "%%timeit col_gpu = cuda.mem_alloc(col.nbytes); cuda.memcpy_htod(col_gpu, col)\n", "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", " block=block, grid=grid)\n", "cuda.memcpy_dtoh(col, col_gpu)" ], "metadata": {} }, { "source": [ "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n\n> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)." ], "cell_type": "markdown", "metadata": {} } ], "metadata": {} } ], "metadata": { "name": "", "signature": "sha256:36dc07362549bc8b15ba523559b6298fd84de8f37083a337d162c93c0b538479" } }