How to convert data <data> rows to category based columns?

I have a pandas data frame with a category variable and some numerical variables. Something like that:

ls = [{'count':5, 'module':'payroll', 'id':2}, {'count': 53, 'module': 'general','id':2}, {'id': 5,'count': 35, 'module': 'tax'}, ] df = pd.DataFrame.from_dict(ls) 

df looks like this:

  df Out[15]: count id module 0 5 2 payroll 1 53 2 general 2 35 5 tax 

I want the conversion (transposition is the right word?), The variables of the module into columns and group by id. So something like:

  general_count id payroll_count tax_count 0 53.0 2 5.0 NaN 1 NaN 5 NaN 35.0 

One approach to this would be to use:

 df['payroll_count'] = df.id.apply(lambda x: df[df.id==x][df.module=='payroll']) 

However, it suffers from several drawbacks:

  • Expensive and takes too much time

  • Creates artifacts and empty ones that need to be cleared.

I feel that the best way to achieve this is with pandas groupby , but cannot find a way to do the same operation more efficiently. Please help.

+5
source share
2 answers

You can use groupby columns, which first create new index and last column . then you need to somehow evade - I use mean , then converts one DataFrame column to Series through DataFrame.squeeze (then there is no need to remove the top level of Multiindex in the columns) and change the unstack shape. Last add_suffix for the column name:

 df = df.groupby(['id','module']).mean().squeeze().unstack().add_suffix('_count') print (df) module general_count payroll_count tax_count id 2 53.0 5.0 NaN 5 NaN NaN 35.0 

Another solution with pivot , then you need to remove Multiindex from the columns on the list comprehension :

 df = df.pivot(index='id', columns='module') df.columns = ['_'.join((col[1], col[0])) for col in df.columns] print (df) general_count payroll_count tax_count id 2 53.0 5.0 NaN 5 NaN NaN 35.0 
+5
source

You can use set_index and unstack

 In [2]: df.set_index(['id','module'])['count'].unstack().add_suffix('_count').reset_index() Out[2]: module id general_count payroll_count tax_count 0 2 53.0 5.0 NaN 1 5 NaN NaN 35.0 
0
source

All Articles