{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Redo-the-experiment-of-Christof-Monz-in-Numpy\" data-toc-modified-id=\"Redo-the-experiment-of-Christof-Monz-in-Numpy-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Redo the experiment of Christof Monz in Numpy</a></span><ul class=\"toc-item\"><li><span><a href=\"#Two-experiments\" data-toc-modified-id=\"Two-experiments-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>Two experiments</a></span></li><li><span><a href=\"#The-task\" data-toc-modified-id=\"The-task-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>The task</a></span></li><li><span><a href=\"#Redo-Monz-experiment-in-Numpy\" data-toc-modified-id=\"Redo-Monz-experiment-in-Numpy-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>Redo Monz experiment in Numpy</a></span><ul class=\"toc-item\"><li><span><a href=\"#Outcome\" data-toc-modified-id=\"Outcome-1.3.1\"><span class=\"toc-item-num\">1.3.1&nbsp;&nbsp;</span>Outcome</a></span></li></ul></li><li><span><a href=\"#Double-for-loop-:-over-the-elements-individually\" data-toc-modified-id=\"Double-for-loop-:-over-the-elements-individually-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>Double for loop : over the elements individually</a></span><ul class=\"toc-item\"><li><span><a href=\"#Outcome\" data-toc-modified-id=\"Outcome-1.4.1\"><span class=\"toc-item-num\">1.4.1&nbsp;&nbsp;</span>Outcome</a></span></li></ul></li></ul></li><li><span><a href=\"#Original-code-of-Monz\" data-toc-modified-id=\"Original-code-of-Monz-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Original code of Monz</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Redo the experiment of Christof Monz in Numpy\n",
    "\n",
    "The message **never use a loop**, but use *vectorized computations* holds just as well for computing in `numpy` or `pandas` as for `pytorch` or `tensorflow`.\n",
    "\n",
    "So we repeat the experiment here in numpy. The technique we use to filter (_sample_) the large 2D matrix is called [fancy indexing](https://jakevdp.github.io/PythonDataScienceHandbook/02.07-fancy-indexing.html) and is really worthwhile studying. It works very  fast, and does in effect do the same job as the view described in the code of Monz.\n",
    "\n",
    "## Two experiments\n",
    "\n",
    "We do not \"feel\" the bad effects of _for looping_ when we exactly repeat Christofs (sampling over rows) experiment. But if we also loop over the columns, with the same step size, we really do see the difference between the \"for\" and the \"best\" run.\n",
    "\n",
    "Times are measured on a Macbook Pro (2015)  with 2.9 GHz Intel Core i5 processor and 16 GB  RAM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(2000, 4096)"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import numpy as np\n",
    "import time\n",
    "\n",
    "# global variables\n",
    "input_size = 2000\n",
    "hidden_size = 4096\n",
    "\n",
    "\n",
    "##  two np matrices of shape (2000, 4096)\n",
    "\n",
    "A= np.random.rand(input_size,hidden_size)\n",
    "B=  np.random.rand(input_size,hidden_size)\n",
    "A.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The task\n",
    "\n",
    "Compute the element wise product of two 2D matrices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(2000, 4096)\n",
      "CPU times: user 26.7 ms, sys: 22.9 ms, total: 49.6 ms\n",
      "Wall time: 47.7 ms\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "\n",
    "C= A*B\n",
    "\n",
    "print(C.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Redo Monz experiment in Numpy\n",
    "\n",
    "We do each method with three stepsizes and report the stepsize, the method name, the time, and the start of the output. \n",
    "\n",
    "Note that the step size was fixed to 100 in Cristofs experiment. We add stepsizes of 10 and 1 (1 is of course the complete matrix).\n",
    "### Outcome\n",
    "\n",
    "The `for` method beats the `best` method consistently, however they are in the same order of magnitude.\n",
    "The other methods perform more or less similar to computing the complete product (`nofilter`).\n",
    "When the stepsize equals 1 (so we compute the complete product) all methods time in the same order of magnitude."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 nofilter 16.126432180404663 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "100 allthenfor 14.560197114944458 [0.50264173 0.84403391 0.31848951 0.25569864 0.63406363]\n",
      "100 for 0.08021712303161621 [0.50264173 0.84403391 0.31848951 0.25569864 0.63406363]\n",
      "100 better 14.425140142440796 [0.50264173 0.84403391 0.31848951 0.25569864 0.63406363]\n",
      "100 best 0.11473417282104492 [0.50264173 0.84403391 0.31848951 0.25569864 0.63406363]\n",
      "10 nofilter 14.875795125961304 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "10 allthenfor 18.194615125656128 [0.20850192 0.15585488 0.09869398 0.23816242 0.13696984]\n",
      "10 for 1.2545669078826904 [0.20850192 0.15585488 0.09869398 0.23816242 0.13696984]\n",
      "10 better 15.554318904876709 [0.20850192 0.15585488 0.09869398 0.23816242 0.13696984]\n",
      "10 best 2.2483460903167725 [0.20850192 0.15585488 0.09869398 0.23816242 0.13696984]\n",
      "1 nofilter 13.711044073104858 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "1 allthenfor 27.244115114212036 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "1 for 16.36823081970215 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "1 better 25.547497987747192 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "1 best 27.194308042526245 [0.50812953 0.23894935 0.23581109 0.1770699  0.37806036]\n",
      "CPU times: user 2min 23s, sys: 1min, total: 3min 24s\n",
      "Wall time: 3min 27s\n"
     ]
    }
   ],
   "source": [
    "%%time \n",
    "\n",
    "def experiment(x,y,method='best',step_size=100, max_input = int(1e6)):\n",
    "    repeats = max_input//input_size\n",
    "    #print('start')\n",
    "    start_time = time.time()\n",
    "    for i in range(1,repeats+1):\n",
    "        if i%1000 == 0:\n",
    "            print(i)\n",
    "        if method=='nofilter':    # no filter at all\n",
    "            output= x*y    \n",
    "        elif method=='for':   # for loop\n",
    "            output_index = 0\n",
    "            output=np.zeros((input_size//step_size,hidden_size))\n",
    "            for j in range(step_size-1,input_size,step_size):\n",
    "                output[output_index] = x[j] * y[j]\n",
    "                output_index += 1\n",
    "        elif method == 'allthenfor': # first product, then filter by a for loop\n",
    "            z = x*y\n",
    "            output_index = 0\n",
    "            output=np.zeros((input_size//step_size,hidden_size))\n",
    "            for j in range(step_size-1,input_size,step_size):\n",
    "                output[output_index] = z[j]\n",
    "                output_index += 1\n",
    "        elif method == 'better':  # fancy indexing after the product\n",
    "            output = (x*y)[range(step_size-1,input_size,step_size),:]\n",
    "        elif method == 'best':  # fancy indexing before the product\n",
    "            output = (x[range(step_size-1,input_size,step_size),:]*\n",
    "                      y[range(step_size-1,input_size,step_size),:])\n",
    "    elapsed_time = time.time() - start_time    \n",
    "    return elapsed_time,output\n",
    "\n",
    "'''\n",
    "for method in ['nofilter','allthenfor','for','better','best']:\n",
    "    step_size=100\n",
    "    tijd,out= experiment(A,B,method=method,step_size=step_size)\n",
    "    print(step_size,method,tijd,out[0,:5])\n",
    "'''\n",
    "\n",
    "for step_size in [100,10,1]:\n",
    "    for method in ['nofilter','allthenfor','for','better','best']:\n",
    "        tijd,out= experiment(A,B,method=method,step_size=step_size)\n",
    "        print(step_size,method,tijd,out[0,:5])    "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##  Double for loop : over the elements individually\n",
    "\n",
    "We do a double for loop, over the columns and the rows.\n",
    "\n",
    "We restrict to the for and best runs.\n",
    "\n",
    "We change the hidden size to 4100, so it is divisible by 10 and 100. \n",
    "\n",
    "Note that we let the expeirment run here by default only 100K times. The double for loop amounts to 16M computations in the for loop, so that takes a lot of time.....\n",
    "\n",
    "### Outcome\n",
    "\n",
    "`best` is always an order of maginitude faster than `for`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 best 0.001085042953491211 (20, 41) [0.56075119 0.09664409 0.17087498 0.44948774 0.1755863 ]\n",
      "100 for 0.022330760955810547 (20, 41) [0.56075119 0.09664409 0.17087498 0.44948774 0.1755863 ]\n",
      "10 best 0.0838780403137207 (200, 410) [0.22263665 0.03570748 0.04859623 0.65376252 0.24604552]\n",
      "10 for 2.1343231201171875 (200, 410) [0.22263665 0.03570748 0.04859623 0.65376252 0.24604552]\n",
      "1 best 6.50712776184082 (2000, 4100) [0.29378663 0.60332941 0.42967505 0.65462595 0.65623145]\n",
      "1 for 216.48570704460144 (2000, 4100) [0.29378663 0.60332941 0.42967505 0.65462595 0.65623145]\n",
      "CPU times: user 3min 43s, sys: 2.1 s, total: 3min 45s\n",
      "Wall time: 3min 45s\n"
     ]
    }
   ],
   "source": [
    "%%time \n",
    "## double for loop : over the elements individually\n",
    "\n",
    "input_size = 2000\n",
    "hidden_size = 4100\n",
    "\n",
    "A= np.random.rand(input_size,hidden_size)\n",
    "B=  np.random.rand(input_size,hidden_size)\n",
    "\n",
    "\n",
    "def experiment(x,y,method='best',step_size=100, max_input = int(1e5)):\n",
    "    repeats = max_input//input_size\n",
    "    #print('start')\n",
    "    start_time = time.time()\n",
    "    for i in range(1,repeats+1):\n",
    "        if i%1000 == 0:\n",
    "            print(i)\n",
    "        if method=='nofilter':    # no filter at all\n",
    "            output= x*y    \n",
    "        elif method=='for':   # for loop\n",
    "            output=np.zeros((input_size//step_size,hidden_size//step_size))\n",
    "            output_index_j = 0\n",
    "            for j in range(step_size-1,input_size,step_size):  # j is row\n",
    "                output_index_i = 0\n",
    "                for i in range(step_size-1,hidden_size,step_size):  # i is column\n",
    "                    output[output_index_j,output_index_i] = x[j,i] * y[j,i]\n",
    "                    output_index_i += 1\n",
    "                output_index_j += 1\n",
    "        elif method == 'allthenfor': # first product, then filter by a for loop\n",
    "            z = x*y\n",
    "            output_index = 0\n",
    "            output=np.zeros((input_size//step_size,hidden_size))\n",
    "            for j in range(step_size-1,input_size,step_size):\n",
    "                output[output_index] = z[j]\n",
    "                output_index += 1\n",
    "        elif method == 'better':  # fancy indexing after the product\n",
    "            output = (x*y)[range(step_size-1,input_size,step_size),:]\n",
    "        elif method == 'best':  # fancy indexing before the product\n",
    "            row = np.arange(step_size-1,input_size,step_size)\n",
    "            col= np.arange(step_size-1,hidden_size,step_size)\n",
    "            smallx= x[row[:, np.newaxis], col]\n",
    "            smally=y[row[:, np.newaxis], col]\n",
    "            output =  smallx * smally\n",
    "    elapsed_time = time.time() - start_time    \n",
    "    return elapsed_time,output\n",
    "\n",
    "for step_size in [100,10,1]:\n",
    "    for method in ['best','for']:\n",
    "        tijd,out= experiment(A,B,method=method,step_size=step_size)\n",
    "        print(step_size,method,tijd,out.shape,out[0,:5])  "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Original code of Monz\n",
    "\n",
    "```\n",
    "import torch\n",
    "import sys\n",
    "import torch.nn as nn\n",
    "import time\n",
    "\n",
    "torch.manual_seed(1)\n",
    "\n",
    "# grun-pascal python ~/code/pytorch/toy/soos.py\n",
    "# srun --unbuffered --partition=gpu --gres=gpu:1 --mem=40G --nodelist=ilps-gpu10 python ~/code/pytorch/toy/soos.py for\n",
    "\n",
    "input_size = 2000\n",
    "hidden_size = 4096\n",
    "step_size = 100\n",
    "\n",
    "# method = 'for'\n",
    "method = sys.argv[1]\n",
    "\n",
    "if method == 'bestest':\n",
    "    input_size *= step_size\n",
    "\n",
    "x = torch.rand([input_size,hidden_size]).cuda()\n",
    "y = torch.rand([input_size,hidden_size]).cuda()\n",
    "\n",
    "\n",
    "output_size = input_size//step_size\n",
    "output = torch.zeros([output_size,hidden_size]).cuda()\n",
    "\n",
    "\n",
    "max_input = int(1e10)\n",
    "repeats = max_input//input_size\n",
    "\n",
    "start_time = time.time()\n",
    "\n",
    "print('start')\n",
    "for i in range(1,repeats+1):\n",
    "    if i%5000 == 0:\n",
    "        print(i)\n",
    "\n",
    "    if method == 'for':\n",
    "        output_index = 0\n",
    "        for j in range(step_size-1,input_size,step_size):\n",
    "            output[output_index] = x[j] * y[j]\n",
    "            output_index += 1\n",
    "    elif method == 'allthenfor':\n",
    "        z = x*y\n",
    "        output_index = 0\n",
    "        for j in range(step_size-1,input_size,step_size):\n",
    "            output[output_index] = z[j]\n",
    "            output_index += 1\n",
    "    elif method == 'better':\n",
    "        output = (x*y).view(output_size,-1)[:,(step_size-1)*hidden_size:step_size*hidden_size].contiguous()\n",
    "    elif method == 'best' or method == 'bestest':\n",
    "        output = (x.view(output_size,-1)[:,(step_size-1)*hidden_size:step_size*hidden_size] \\\n",
    "                * y.view(output_size,-1)[:,(step_size-1)*hidden_size:step_size*hidden_size]).contiguous()\n",
    "\n",
    "    if i==repeats:\n",
    "        # print(output.size())\n",
    "        print(output[3,0:5])\n",
    "\n",
    "elapsed_time = time.time() - start_time\n",
    "print(time.strftime(\"%H:%M:%S\", time.gmtime(elapsed_time)))\n",
    "outputs_per_sec = int(((max_input/step_size)/elapsed_time)/1000)\n",
    "print(\"outputs per sec = \" +  \"{:,}\".format(outputs_per_sec) + \" K\")\n",
    "\n",
    "sys.exit()\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}