Create a pivot table listing the values

Which aggfunc do I need to use to create a list using a pivot table? I tried using str, which does not work.

Inputs

import pandas as pd data = { 'Test point': [0, 1, 2, 0, 1], 'Experiment': [1, 2, 3, 4, 5] } df = pd.DataFrame(data) print df pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len) print pivot pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str) print pivot 

results

  Experiment Test point 0 1 0 1 2 1 2 3 2 3 4 0 4 5 1 Experiment Test point 0 2 1 2 2 1 Experiment Test point 0 0 1\n3 4\nName: Experiment, dtype: int64 1 1 2\n4 5\nName: Experiment, dtype: int64 2 2 3\nName: Experiment, dtype: int64 

Desired Conclusion

  Experiment Test point 0 1, 4 1 2, 5 2 3 
+8
python pandas pivot-table
source share
3 answers

you can use list as a function:

 >>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x)) Experiment Test point 0 [1, 4] 1 [2, 5] 2 [3] 
+2
source share

Using

 In [1830]: pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x: ', '.join(x.astype(str))) Out[1830]: Experiment Test point 0 1, 4 1 2, 5 2 3 

Or, groupby will do.

 In [1831]: df.groupby('Test point').agg({ 'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) Out[1831]: Experiment Test point 0 1, 4 1 2, 5 2 3 

But, if you want it to be like a list.

 In [1861]: df.groupby('Test point').agg({'Experiment': lambda x: x.tolist()}) Out[1861]: Experiment Test point 0 [1, 4] 1 [2, 5] 2 [3] 

x.astype(str).str.cat(sep=', ') is like ', '.join(x.astype(str))

+7
source share

Option 1
str groupby + groupby + apply .

You can pre-convert to a string to simplify the call to groupby .

 df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3 

And a modification of this will include assigning a place for speed ( assign returns a copy and slower):

 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3 

On the other hand, changing the original frame as well.

Performance

 # Zero 1st solution %%timeit df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) 100 loops, best of 3: 3.72 ms per loop 
 # Zero second solution %%timeit pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x: ', '.join(x.astype(str))) 100 loops, best of 3: 5.17 ms per loop 
 # proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 1 loop, best of 3: 2.02 ms per loop 

Note that the .assign method .assign only a few milliseconds slower than this. Large performance metrics should be visible for large data frames.


Option 2
groupby + agg :

A similar operation follows with agg :

 df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').agg({'Experiment' : ', '.join}) Experiment Test point 0 1, 4 1 2, 5 2 3 

And the internal version of this will be the same as above.

 # proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').agg({'Experiment' : ', '.join}) 1 loop, best of 3: 2.21 ms per loop 

agg should see a speed increase more apply for large data frames.

+1
source share

All Articles