{
 "nbformat": 3, 
 "nbformat_minor": 0, 
 "worksheets": [
  {
   "cells": [
    {
     "source": [
      "> This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.\n"
     ], 
     "cell_type": "markdown", 
     "metadata": []
    }, 
    {
     "source": [
      "# 5.7. Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA"
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "source": [
      "Let's import PyCUDA."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "import pycuda.driver as cuda\n", 
      "import pycuda.autoinit\n", 
      "from pycuda.compiler import SourceModule\n", 
      "import numpy as np"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "Now, we initialize the NumPy array that will contain the fractal."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "size = 200\n", 
      "iterations = 100\n", 
      "col = np.empty((size, size), dtype=np.int32)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "We allocate memory for this array on the GPU."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "col_gpu = cuda.mem_alloc(col.nbytes)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "We write the CUDA kernel in a string. The mandelbrot function accepts the figure size, the number of iterations, and a pointer to the memory buffer as arguments. It updates the col buffer with the escape value in the fractal for each pixel."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "code = \"\"\"\n", 
      "__global__ void mandelbrot(int size,\n", 
      "                           int iterations,\n", 
      "                           int *col)\n", 
      "\n", 
      "{\n", 
      "    // Get the row and column index of the current thread.\n", 
      "    int i = blockIdx.y * blockDim.y + threadIdx.y;\n", 
      "    int j = blockIdx.x * blockDim.x + threadIdx.x;\n", 
      "    int index = i * size + j;\n", 
      "    \n", 
      "    // Declare and initialize the variables.\n", 
      "    double cx, cy;\n", 
      "    double z0, z1, z0_tmp, z0_2, z1_2;\n", 
      "    cx = -2.0 + (double)j / size * 3;\n", 
      "    cy = -1.5 + (double)i / size * 3;\n", 
      "\n", 
      "    // Main loop.\n", 
      "    z0 = z1 = 0.0;\n", 
      "    for (int n = 0; n < iterations; n++)\n", 
      "    {\n", 
      "        z0_2 = z0 * z0;\n", 
      "        z1_2 = z1 * z1;\n", 
      "        if (z0_2 + z1_2 <= 100)\n", 
      "        {\n", 
      "            // Need to update z0 and z1 in parallel.\n", 
      "            z0_tmp = z0_2 - z1_2 + cx;\n", 
      "            z1 = 2 * z0 * z1 + cy;\n", 
      "            z0 = z0_tmp;\n", 
      "            col[index] = n;\n", 
      "        }\n", 
      "        else break;\n", 
      "    }\n", 
      "}\n", 
      "\"\"\""
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "Now, we compile the CUDA program."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "prg = SourceModule(code)\n", 
      "mandelbrot = prg.get_function(\"mandelbrot\")"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "We define the block size and the grid size, specifying how the threads will be parallelized with respect to the data."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "block_size = 10\n", 
      "block = (block_size, block_size, 1)\n", 
      "grid = (size // block_size, size // block_size, 1)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "We call the compiled function."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", 
      "           block=block, grid=grid)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "Once the function has completed, we copy the contents of the CUDA buffer back to the NumPy array col."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "cuda.memcpy_dtoh(col, col_gpu)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "Let's display the fractal."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "import matplotlib.pyplot as plt\n", 
      "%matplotlib inline\n", 
      "plt.imshow(np.log(col), cmap=plt.cm.hot,);\n", 
      "plt.xticks([]);\n", 
      "plt.yticks([]);"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "Let's evaluate the time taken by this function."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }, 
    {
     "cell_type": "code", 
     "language": "python", 
     "outputs": [], 
     "collapsed": false, 
     "input": [
      "%%timeit col_gpu = cuda.mem_alloc(col.nbytes); cuda.memcpy_htod(col_gpu, col)\n", 
      "mandelbrot(np.int32(size), np.int32(iterations), col_gpu,\n", 
      "           block=block, grid=grid)\n", 
      "cuda.memcpy_dtoh(col, col_gpu)"
     ], 
     "metadata": {}
    }, 
    {
     "source": [
      "> You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).\n\n> [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages)."
     ], 
     "cell_type": "markdown", 
     "metadata": {}
    }
   ], 
   "metadata": {}
  }
 ], 
 "metadata": {
  "name": "", 
  "signature": "sha256:36dc07362549bc8b15ba523559b6298fd84de8f37083a337d162c93c0b538479"
 }
}