Create a pivot table listing the values

Question

Create a pivot table listing the values

Which aggfunc do I need to use to create a list using a pivot table? I tried using str, which does not work.

Inputs

import pandas as pd data = { 'Test point': [0, 1, 2, 0, 1], 'Experiment': [1, 2, 3, 4, 5] } df = pd.DataFrame(data) print df pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=len) print pivot pivot = pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=str) print pivot

results

  Experiment Test point 0 1 0 1 2 1 2 3 2 3 4 0 4 5 1 Experiment Test point 0 2 1 2 2 1 Experiment Test point 0 0 1\n3 4\nName: Experiment, dtype: int64 1 1 2\n4 5\nName: Experiment, dtype: int64 2 2 3\nName: Experiment, dtype: int64

Desired Conclusion

  Experiment Test point 0 1, 4 1 2, 5 2 3

+8

python pandas pivot-table

bluprince13 Oct 14 '17 at 10:46

source share

3 answers

Using

 In [1830]: pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x: ', '.join(x.astype(str))) Out[1830]: Experiment Test point 0 1, 4 1 2, 5 2 3

Or, groupby will do.

 In [1831]: df.groupby('Test point').agg({ 'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) Out[1831]: Experiment Test point 0 1, 4 1 2, 5 2 3

But, if you want it to be like a list.

 In [1861]: df.groupby('Test point').agg({'Experiment': lambda x: x.tolist()}) Out[1861]: Experiment Test point 0 [1, 4] 1 [2, 5] 2 [3]

x.astype(str).str.cat(sep=', ') is like ', '.join(x.astype(str))

+7

Zero Oct 14 '17 at 10:56

source share

Option 1
str groupby + groupby + apply .

You can pre-convert to a string to simplify the call to groupby .

 df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3

And a modification of this will include assigning a place for speed ( assign returns a copy and slower):

 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3

On the other hand, changing the original frame as well.

Performance

 # Zero 1st solution %%timeit df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) 100 loops, best of 3: 3.72 ms per loop

 # Zero second solution %%timeit pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x: ', '.join(x.astype(str))) 100 loops, best of 3: 5.17 ms per loop

 # proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 1 loop, best of 3: 2.02 ms per loop

Note that the .assign method .assign only a few milliseconds slower than this. Large performance metrics should be visible for large data frames.

Option 2
groupby + agg :

A similar operation follows with agg :

 df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').agg({'Experiment' : ', '.join}) Experiment Test point 0 1, 4 1 2, 5 2 3

And the internal version of this will be the same as above.

 # proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').agg({'Experiment' : ', '.join}) 1 loop, best of 3: 2.21 ms per loop

agg should see a speed increase more apply for large data frames.

+1

cᴏʟᴅsᴘᴇᴇᴅ Oct 24 '17 at 8:34

source share

Roman pekar · Accepted Answer · 2017-10-25T11:01:37+0000

you can use list as a function:

 >>> pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x:list(x)) Experiment Test point 0 [1, 4] 1 [2, 5] 2 [3]

Create a pivot table listing the values

More articles: