{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The pivot table is a powerful tool to summarize and present data. Pandas has a function which allows you to quickly convert a DataFrame to a pivot table - \n", "http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot_table.html\n", "\n", "This function is very useful but sometimes it can be tricky to remember how to use it to get the data formatted in a way you need.\n", "\n", "This notebook will walk through how to use the pivot_table.\n", "\n", "The full blog post for this article is here - http://pbpython.com/pandas-pivot-table-explained.html" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Read in the data" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Read in our sales funnel data into our DataFrame" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccountNameRepManagerProductQuantityPriceStatus
0714466Trantow-BarrowsCraig BookerDebra HenleyCPU130000presented
1714466Trantow-BarrowsCraig BookerDebra HenleySoftware110000presented
2714466Trantow-BarrowsCraig BookerDebra HenleyMaintenance25000pending
3737550Fritsch, Russel and AndersonCraig BookerDebra HenleyCPU135000declined
4146832Kiehn-SpinkaDaniel HiltonDebra HenleyCPU265000won
\n", "
" ], "text/plain": [ " Account Name Rep Manager \\\n", "0 714466 Trantow-Barrows Craig Booker Debra Henley \n", "1 714466 Trantow-Barrows Craig Booker Debra Henley \n", "2 714466 Trantow-Barrows Craig Booker Debra Henley \n", "3 737550 Fritsch, Russel and Anderson Craig Booker Debra Henley \n", "4 146832 Kiehn-Spinka Daniel Hilton Debra Henley \n", "\n", " Product Quantity Price Status \n", "0 CPU 1 30000 presented \n", "1 Software 1 10000 presented \n", "2 Maintenance 2 5000 pending \n", "3 CPU 1 35000 declined \n", "4 CPU 2 65000 won " ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_excel(\"http://maartenmarx.nl/teaching/DataScience/Data/sales-funnel.xlsx\") # locally the file is in the ../Data folder\n", "#df = pd.read_excel(\"../in/sales-funnel.xlsx\")\n", "df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For convenience sake, let's define the status column as a category and set the order we want to view.\n", "\n", "This isn't strictly required but helps us keep the order we want as we work through analyzing the data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "df[\"Status\"] = df[\"Status\"].astype(\"category\")\n", "df[\"Status\"].cat.set_categories([\"won\",\"pending\",\"presented\",\"declined\"],inplace=True)\n", "#df.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Pivot the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we build up the pivot table, I think it's easiest to take it one step at a time. Add items one at a time and check each step to verify you are getting the results you expect.\n", "\n", "The simplest pivot table must have a dataframe and an index. In this case, let's use the Name as our index." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccountPriceQuantity
Name
Barton LLC740150350001.000000
Fritsch, Russel and Anderson737550350001.000000
Herman LLC141962650002.000000
Jerde-Hilpert41229050002.000000
Kassulke, Ondricka and Metz30759970003.000000
Keeling LLC6889811000005.000000
Kiehn-Spinka146832650002.000000
Koepp Ltd729833350002.000000
Kulas Inc218895250001.500000
Purdy-Kunde163416300001.000000
Stokes LLC23934475001.000000
Trantow-Barrows714466150001.333333
\n", "
" ], "text/plain": [ " Account Price Quantity\n", "Name \n", "Barton LLC 740150 35000 1.000000\n", "Fritsch, Russel and Anderson 737550 35000 1.000000\n", "Herman LLC 141962 65000 2.000000\n", "Jerde-Hilpert 412290 5000 2.000000\n", "Kassulke, Ondricka and Metz 307599 7000 3.000000\n", "Keeling LLC 688981 100000 5.000000\n", "Kiehn-Spinka 146832 65000 2.000000\n", "Koepp Ltd 729833 35000 2.000000\n", "Kulas Inc 218895 25000 1.500000\n", "Purdy-Kunde 163416 30000 1.000000\n", "Stokes LLC 239344 7500 1.000000\n", "Trantow-Barrows 714466 15000 1.333333" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Name\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternatieve syntax\n", "* Gebruik de `pivot_table` methode van een DataFrame." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccountPriceQuantity
Name
Barton LLC740150350001.0
Fritsch, Russel and Anderson737550350001.0
Herman LLC141962650002.0
Jerde-Hilpert41229050002.0
Kassulke, Ondricka and Metz30759970003.0
\n", "
" ], "text/plain": [ " Account Price Quantity\n", "Name \n", "Barton LLC 740150 35000 1.0\n", "Fritsch, Russel and Anderson 737550 35000 1.0\n", "Herman LLC 141962 65000 2.0\n", "Jerde-Hilpert 412290 5000 2.0\n", "Kassulke, Ondricka and Metz 307599 7000 3.0" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.pivot_table(index=[\"Name\"]).head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can have multiple indexes as well. In fact, most of the pivot_table args can take multiple values via a list." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccountPriceQuantity
NameRepManager
Barton LLCJohn SmithDebra Henley740150350001.000000
Fritsch, Russel and AndersonCraig BookerDebra Henley737550350001.000000
Herman LLCCedric MossFred Anderson141962650002.000000
Jerde-HilpertJohn SmithDebra Henley41229050002.000000
Kassulke, Ondricka and MetzWendy YuleFred Anderson30759970003.000000
Keeling LLCWendy YuleFred Anderson6889811000005.000000
Kiehn-SpinkaDaniel HiltonDebra Henley146832650002.000000
Koepp LtdWendy YuleFred Anderson729833350002.000000
Kulas IncDaniel HiltonDebra Henley218895250001.500000
Purdy-KundeCedric MossFred Anderson163416300001.000000
Stokes LLCCedric MossFred Anderson23934475001.000000
Trantow-BarrowsCraig BookerDebra Henley714466150001.333333
\n", "
" ], "text/plain": [ " Account Price \\\n", "Name Rep Manager \n", "Barton LLC John Smith Debra Henley 740150 35000 \n", "Fritsch, Russel and Anderson Craig Booker Debra Henley 737550 35000 \n", "Herman LLC Cedric Moss Fred Anderson 141962 65000 \n", "Jerde-Hilpert John Smith Debra Henley 412290 5000 \n", "Kassulke, Ondricka and Metz Wendy Yule Fred Anderson 307599 7000 \n", "Keeling LLC Wendy Yule Fred Anderson 688981 100000 \n", "Kiehn-Spinka Daniel Hilton Debra Henley 146832 65000 \n", "Koepp Ltd Wendy Yule Fred Anderson 729833 35000 \n", "Kulas Inc Daniel Hilton Debra Henley 218895 25000 \n", "Purdy-Kunde Cedric Moss Fred Anderson 163416 30000 \n", "Stokes LLC Cedric Moss Fred Anderson 239344 7500 \n", "Trantow-Barrows Craig Booker Debra Henley 714466 15000 \n", "\n", " Quantity \n", "Name Rep Manager \n", "Barton LLC John Smith Debra Henley 1.000000 \n", "Fritsch, Russel and Anderson Craig Booker Debra Henley 1.000000 \n", "Herman LLC Cedric Moss Fred Anderson 2.000000 \n", "Jerde-Hilpert John Smith Debra Henley 2.000000 \n", "Kassulke, Ondricka and Metz Wendy Yule Fred Anderson 3.000000 \n", "Keeling LLC Wendy Yule Fred Anderson 5.000000 \n", "Kiehn-Spinka Daniel Hilton Debra Henley 2.000000 \n", "Koepp Ltd Wendy Yule Fred Anderson 2.000000 \n", "Kulas Inc Daniel Hilton Debra Henley 1.500000 \n", "Purdy-Kunde Cedric Moss Fred Anderson 1.000000 \n", "Stokes LLC Cedric Moss Fred Anderson 1.000000 \n", "Trantow-Barrows Craig Booker Debra Henley 1.333333 " ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Name\",\"Rep\",\"Manager\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is interesting but not particularly useful. What we probably want to do is look at this by Manager and Director.\n", "It's easy enough to do by changing the index." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AccountPriceQuantity
ManagerRep
Debra HenleyCraig Booker720237.020000.0000001.250000
Daniel Hilton194874.038333.3333331.666667
John Smith576220.020000.0000001.500000
Fred AndersonCedric Moss196016.527500.0000001.250000
Wendy Yule614061.544250.0000003.000000
\n", "
" ], "text/plain": [ " Account Price Quantity\n", "Manager Rep \n", "Debra Henley Craig Booker 720237.0 20000.000000 1.250000\n", " Daniel Hilton 194874.0 38333.333333 1.666667\n", " John Smith 576220.0 20000.000000 1.500000\n", "Fred Anderson Cedric Moss 196016.5 27500.000000 1.250000\n", " Wendy Yule 614061.5 44250.000000 3.000000" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can see that the pivot table is smart enough to start aggregating the data and summarizing it by grouping the reps with their managers. Now we start to get a glimpse of what a pivot table can do for us.\n", "\n", "For this purpose, the Account and Quantity columns aren't really useful. Let's remove it by explicitly defining the columns we care about using the values field." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Price
ManagerRep
Debra HenleyCraig Booker20000.000000
Daniel Hilton38333.333333
John Smith20000.000000
Fred AndersonCedric Moss27500.000000
Wendy Yule44250.000000
\n", "
" ], "text/plain": [ " Price\n", "Manager Rep \n", "Debra Henley Craig Booker 20000.000000\n", " Daniel Hilton 38333.333333\n", " John Smith 20000.000000\n", "Fred Anderson Cedric Moss 27500.000000\n", " Wendy Yule 44250.000000" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The price column automatically averages the data but we can do a count or a sum. Adding them is simple using aggfunc." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Price
ManagerRep
Debra HenleyCraig Booker80000
Daniel Hilton115000
John Smith40000
Fred AndersonCedric Moss110000
Wendy Yule177000
\n", "
" ], "text/plain": [ " Price\n", "Manager Rep \n", "Debra Henley Craig Booker 80000\n", " Daniel Hilton 115000\n", " John Smith 40000\n", "Fred Anderson Cedric Moss 110000\n", " Wendy Yule 177000" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"],aggfunc=sum)# .reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "aggfunc can take a list of functions. Let's try a mean using the numpy functions and len to get a count." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
meanlensum
PricePricePrice
ManagerRep
Debra HenleyCraig Booker20000.000000480000
Daniel Hilton38333.3333333115000
John Smith20000.000000240000
Fred AndersonCedric Moss27500.0000004110000
Wendy Yule44250.0000004177000
\n", "
" ], "text/plain": [ " mean len sum\n", " Price Price Price\n", "Manager Rep \n", "Debra Henley Craig Booker 20000.000000 4 80000\n", " Daniel Hilton 38333.333333 3 115000\n", " John Smith 20000.000000 2 40000\n", "Fred Anderson Cedric Moss 27500.000000 4 110000\n", " Wendy Yule 44250.000000 4 177000" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"],aggfunc=[np.mean,len,sum ])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If we want to see sales broken down by the products, the columns variable allows us to define one or more columns." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sum
Price
ProductCPUMaintenanceMonitorSoftware
ManagerRep
Debra HenleyCraig Booker65000.05000.0NaN10000.0
Daniel Hilton105000.0NaNNaN10000.0
John Smith35000.05000.0NaNNaN
Fred AndersonCedric Moss95000.05000.0NaN10000.0
Wendy Yule165000.07000.05000.0NaN
\n", "
" ], "text/plain": [ " sum \n", " Price \n", "Product CPU Maintenance Monitor Software\n", "Manager Rep \n", "Debra Henley Craig Booker 65000.0 5000.0 NaN 10000.0\n", " Daniel Hilton 105000.0 NaN NaN 10000.0\n", " John Smith 35000.0 5000.0 NaN NaN\n", "Fred Anderson Cedric Moss 95000.0 5000.0 NaN 10000.0\n", " Wendy Yule 165000.0 7000.0 5000.0 NaN" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"],\n", " columns=[\"Product\"],aggfunc=[np.sum])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The NaN's are a bit distracting. If we want to remove them, we could use fill_value to set them to 0." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sum
Price
ProductCPUMaintenanceMonitorSoftware
ManagerRep
Debra HenleyCraig Booker650005000010000
Daniel Hilton1050000010000
John Smith35000500000
Fred AndersonCedric Moss950005000010000
Wendy Yule165000700050000
\n", "
" ], "text/plain": [ " sum \n", " Price \n", "Product CPU Maintenance Monitor Software\n", "Manager Rep \n", "Debra Henley Craig Booker 65000 5000 0 10000\n", " Daniel Hilton 105000 0 0 10000\n", " John Smith 35000 5000 0 0\n", "Fred Anderson Cedric Moss 95000 5000 0 10000\n", " Wendy Yule 165000 7000 5000 0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"],\n", " columns=[\"Product\"],aggfunc=[np.sum],fill_value=0)\n", "# Alternative way of filling the NaN's\n", "#pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\"],\n", "# columns=[\"Product\"],aggfunc=[np.sum]).fillna(0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I think it would be useful to add the quantity as well. Add Quantity to the values list." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sum
PriceQuantity
ProductCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftware
ManagerRep
Debra HenleyCraig Booker6500050000100002201
Daniel Hilton10500000100004001
John Smith350005000001200
Fred AndersonCedric Moss9500050000100003101
Wendy Yule1650007000500007320
\n", "
" ], "text/plain": [ " sum \\\n", " Price Quantity \n", "Product CPU Maintenance Monitor Software CPU \n", "Manager Rep \n", "Debra Henley Craig Booker 65000 5000 0 10000 2 \n", " Daniel Hilton 105000 0 0 10000 4 \n", " John Smith 35000 5000 0 0 1 \n", "Fred Anderson Cedric Moss 95000 5000 0 10000 3 \n", " Wendy Yule 165000 7000 5000 0 7 \n", "\n", " \n", " \n", "Product Maintenance Monitor Software \n", "Manager Rep \n", "Debra Henley Craig Booker 2 0 1 \n", " Daniel Hilton 0 0 1 \n", " John Smith 2 0 0 \n", "Fred Anderson Cedric Moss 1 0 1 \n", " Wendy Yule 3 2 0 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\"],values=[\"Price\",\"Quantity\"],\n", " columns=[\"Product\"],aggfunc=[np.sum],fill_value=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What's interesting is that you can move items to the index to get a different visual representation. We can add the Products to the index." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sum
PriceQuantity
ManagerRepProduct
Debra HenleyCraig BookerCPU650002
Maintenance50002
Software100001
Daniel HiltonCPU1050004
Software100001
John SmithCPU350001
Maintenance50002
Fred AndersonCedric MossCPU950003
Maintenance50001
Software100001
Wendy YuleCPU1650007
Maintenance70003
Monitor50002
\n", "
" ], "text/plain": [ " sum \n", " Price Quantity\n", "Manager Rep Product \n", "Debra Henley Craig Booker CPU 65000 2\n", " Maintenance 5000 2\n", " Software 10000 1\n", " Daniel Hilton CPU 105000 4\n", " Software 10000 1\n", " John Smith CPU 35000 1\n", " Maintenance 5000 2\n", "Fred Anderson Cedric Moss CPU 95000 3\n", " Maintenance 5000 1\n", " Software 10000 1\n", " Wendy Yule CPU 165000 7\n", " Maintenance 7000 3\n", " Monitor 5000 2" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\",\"Product\"],\n", " values=[\"Price\",\"Quantity\"],aggfunc=[np.sum],fill_value=0)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For this data set, this representation makes more sense. Now, what if I want to see some totals? margins=True does that for us." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
summean
PriceQuantityPriceQuantity
ManagerRepProduct
Debra HenleyCraig BookerCPU650002325001.000000
Maintenance5000250002.000000
Software100001100001.000000
Daniel HiltonCPU1050004525002.000000
Software100001100001.000000
John SmithCPU350001350001.000000
Maintenance5000250002.000000
Fred AndersonCedric MossCPU950003475001.500000
Maintenance5000150001.000000
Software100001100001.000000
Wendy YuleCPU1650007825003.500000
Maintenance7000370003.000000
Monitor5000250002.000000
All52200030307051.764706
\n", "
" ], "text/plain": [ " sum mean \n", " Price Quantity Price Quantity\n", "Manager Rep Product \n", "Debra Henley Craig Booker CPU 65000 2 32500 1.000000\n", " Maintenance 5000 2 5000 2.000000\n", " Software 10000 1 10000 1.000000\n", " Daniel Hilton CPU 105000 4 52500 2.000000\n", " Software 10000 1 10000 1.000000\n", " John Smith CPU 35000 1 35000 1.000000\n", " Maintenance 5000 2 5000 2.000000\n", "Fred Anderson Cedric Moss CPU 95000 3 47500 1.500000\n", " Maintenance 5000 1 5000 1.000000\n", " Software 10000 1 10000 1.000000\n", " Wendy Yule CPU 165000 7 82500 3.500000\n", " Maintenance 7000 3 7000 3.000000\n", " Monitor 5000 2 5000 2.000000\n", "All 522000 30 30705 1.764706" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Rep\",\"Product\"],\n", " values=[\"Price\",\"Quantity\"],\n", " aggfunc=[np.sum,np.mean],fill_value=0,margins=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's move the analysis up a level and look at our pipeline at the manager level. Notice how the status is ordered based on our earlier category definition." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
sum
Price
ManagerStatus
Debra Henleywon65000.0
pending50000.0
presented50000.0
declined70000.0
Fred Andersonwon172000.0
pending5000.0
presented45000.0
declined65000.0
All522000.0
\n", "
" ], "text/plain": [ " sum\n", " Price\n", "Manager Status \n", "Debra Henley won 65000.0\n", " pending 50000.0\n", " presented 50000.0\n", " declined 70000.0\n", "Fred Anderson won 172000.0\n", " pending 5000.0\n", " presented 45000.0\n", " declined 65000.0\n", "All 522000.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Status\"],values=[\"Price\"],\n", " aggfunc=[np.sum],fill_value=0,margins=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A really handy feature is the ability to pass a dictionary to the aggfunc so you can perform different functions on each of the values you select. This has a side-effect of making the labels a little cleaner." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PriceQuantity
ProductCPUMaintenanceMonitorSoftwareAllCPUMaintenanceMonitorSoftwareAll
ManagerStatus
Debra Henleywon65000.00.00.00.065000.01.00.00.00.01.0
pending40000.010000.00.00.050000.01.02.00.00.03.0
presented30000.00.00.020000.050000.01.00.00.02.03.0
declined70000.00.00.00.070000.02.00.00.00.02.0
Fred Andersonwon165000.07000.00.00.0172000.02.01.00.00.03.0
pending0.05000.00.00.05000.00.01.00.00.01.0
presented30000.00.05000.010000.045000.01.00.01.01.03.0
declined65000.00.00.00.065000.01.00.00.00.01.0
All465000.022000.05000.030000.0522000.09.04.01.03.017.0
\n", "
" ], "text/plain": [ " Price \\\n", "Product CPU Maintenance Monitor Software All \n", "Manager Status \n", "Debra Henley won 65000.0 0.0 0.0 0.0 65000.0 \n", " pending 40000.0 10000.0 0.0 0.0 50000.0 \n", " presented 30000.0 0.0 0.0 20000.0 50000.0 \n", " declined 70000.0 0.0 0.0 0.0 70000.0 \n", "Fred Anderson won 165000.0 7000.0 0.0 0.0 172000.0 \n", " pending 0.0 5000.0 0.0 0.0 5000.0 \n", " presented 30000.0 0.0 5000.0 10000.0 45000.0 \n", " declined 65000.0 0.0 0.0 0.0 65000.0 \n", "All 465000.0 22000.0 5000.0 30000.0 522000.0 \n", "\n", " Quantity \n", "Product CPU Maintenance Monitor Software All \n", "Manager Status \n", "Debra Henley won 1.0 0.0 0.0 0.0 1.0 \n", " pending 1.0 2.0 0.0 0.0 3.0 \n", " presented 1.0 0.0 0.0 2.0 3.0 \n", " declined 2.0 0.0 0.0 0.0 2.0 \n", "Fred Anderson won 2.0 1.0 0.0 0.0 3.0 \n", " pending 0.0 1.0 0.0 0.0 1.0 \n", " presented 1.0 0.0 1.0 1.0 3.0 \n", " declined 1.0 0.0 0.0 0.0 1.0 \n", "All 9.0 4.0 1.0 3.0 17.0 " ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.pivot_table(df,index=[\"Manager\",\"Status\"],columns=[\"Product\"],values=[\"Quantity\",\"Price\"],\n", " aggfunc={\"Quantity\":len,\"Price\":np.sum},fill_value=0,margins=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can provide a list of aggfunctions to apply to each value too:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PriceQuantity
meansumlen
ProductCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftware
ManagerStatus
Debra Henleywon65000000650000001000
pending400005000004000010000001200
presented3000000100003000000200001002
declined35000000700000002000
Fred Andersonwon825007000001650007000002100
pending050000005000000100
presented3000005000100003000005000100001011
declined65000000650000001000
\n", "
" ], "text/plain": [ " Price \\\n", " mean sum \n", "Product CPU Maintenance Monitor Software CPU \n", "Manager Status \n", "Debra Henley won 65000 0 0 0 65000 \n", " pending 40000 5000 0 0 40000 \n", " presented 30000 0 0 10000 30000 \n", " declined 35000 0 0 0 70000 \n", "Fred Anderson won 82500 7000 0 0 165000 \n", " pending 0 5000 0 0 0 \n", " presented 30000 0 5000 10000 30000 \n", " declined 65000 0 0 0 65000 \n", "\n", " Quantity \\\n", " len \n", "Product Maintenance Monitor Software CPU Maintenance \n", "Manager Status \n", "Debra Henley won 0 0 0 1 0 \n", " pending 10000 0 0 1 2 \n", " presented 0 0 20000 1 0 \n", " declined 0 0 0 2 0 \n", "Fred Anderson won 7000 0 0 2 1 \n", " pending 5000 0 0 0 1 \n", " presented 0 5000 10000 1 0 \n", " declined 0 0 0 1 0 \n", "\n", " \n", " \n", "Product Monitor Software \n", "Manager Status \n", "Debra Henley won 0 0 \n", " pending 0 0 \n", " presented 0 2 \n", " declined 0 0 \n", "Fred Anderson won 0 0 \n", " pending 0 0 \n", " presented 1 1 \n", " declined 0 0 " ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table = pd.pivot_table(df,index=[\"Manager\",\"Status\"],columns=[\"Product\"],values=[\"Quantity\",\"Price\"],\n", " aggfunc={\"Quantity\":len,\"Price\":[np.sum,np.mean]},fill_value=0)\n", "table" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can look daunting to try to pull this all together at once but as soon as you start playing with the data and slowly add the items, you can get a feel for how it works." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Advanced Pivot Table Filtering" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once you have generated your data, it is in a DataFrame so you can filter on it using your normal DataFrame functions." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PriceQuantity
meansumlen
ProductCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftware
ManagerStatus
Debra Henleywon65000000650000001000
pending400005000004000010000001200
presented3000000100003000000200001002
declined35000000700000002000
\n", "
" ], "text/plain": [ " Price \\\n", " mean sum \n", "Product CPU Maintenance Monitor Software CPU Maintenance \n", "Manager Status \n", "Debra Henley won 65000 0 0 0 65000 0 \n", " pending 40000 5000 0 0 40000 10000 \n", " presented 30000 0 0 10000 30000 0 \n", " declined 35000 0 0 0 70000 0 \n", "\n", " Quantity \n", " len \n", "Product Monitor Software CPU Maintenance Monitor Software \n", "Manager Status \n", "Debra Henley won 0 0 1 0 0 0 \n", " pending 0 0 1 2 0 0 \n", " presented 0 20000 1 0 0 2 \n", " declined 0 0 2 0 0 0 " ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.query('Manager == [\"Debra Henley\"]')" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PriceQuantity
meansumlen
ProductCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftwareCPUMaintenanceMonitorSoftware
ManagerStatus
Debra Henleywon65000000650000001000
pending400005000004000010000001200
Fred Andersonwon825007000001650007000002100
pending050000005000000100
\n", "
" ], "text/plain": [ " Price \\\n", " mean sum \n", "Product CPU Maintenance Monitor Software CPU Maintenance \n", "Manager Status \n", "Debra Henley won 65000 0 0 0 65000 0 \n", " pending 40000 5000 0 0 40000 10000 \n", "Fred Anderson won 82500 7000 0 0 165000 7000 \n", " pending 0 5000 0 0 0 5000 \n", "\n", " Quantity \n", " len \n", "Product Monitor Software CPU Maintenance Monitor Software \n", "Manager Status \n", "Debra Henley won 0 0 1 0 0 0 \n", " pending 0 0 1 2 0 0 \n", "Fred Anderson won 0 0 2 1 0 0 \n", " pending 0 0 0 1 0 0 " ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "table.query('Status == [\"pending\",\"won\"]')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I hope this tutorial has shown you how to use pivot tables on your data sets." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }