Option 1
str groupby + groupby + apply .
You can pre-convert to a string to simplify the call to groupby .
df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3
And a modification of this will include assigning a place for speed ( assign returns a copy and slower):
df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') Experiment Test point 0 1, 4 1 2, 5 2 3
On the other hand, changing the original frame as well.
Performance
# Zero 1st solution %%timeit df.groupby('Test point').agg({'Experiment': lambda x: x.astype(str).str.cat(sep=', ')}) 100 loops, best of 3: 3.72 ms per loop
# Zero second solution %%timeit pd.pivot_table(df, index=['Test point'], values=['Experiment'], aggfunc=lambda x: ', '.join(x.astype(str))) 100 loops, best of 3: 5.17 ms per loop
# proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').Experiment.apply(', '.join).to_frame('Experiment') 1 loop, best of 3: 2.02 ms per loop
Note that the .assign method .assign only a few milliseconds slower than this. Large performance metrics should be visible for large data frames.
Option 2
groupby + agg :
A similar operation follows with agg :
df.assign(Experiment=df.Experiment.astype(str))\ .groupby('Test point').agg({'Experiment' : ', '.join}) Experiment Test point 0 1, 4 1 2, 5 2 3
And the internal version of this will be the same as above.
# proposed in this post %%timeit -n 1 df.Experiment = df.Experiment.astype(str) df.groupby('Test point').agg({'Experiment' : ', '.join}) 1 loop, best of 3: 2.21 ms per loop
agg should see a speed increase more apply for large data frames.