Python pandas: add column to grouped DataFrame with method chain

Question

Python pandas: add column to grouped DataFrame with method chain

First let's say I'm new to pandas.

I am trying to create a new column in a DataFrame. I can do this as shown in my example. But I want to do this using chaining methods, so I don't need to assign new variables. Let me first show what I want to achieve, and what I have done so far:

In [1]: import numpy as np from pandas import Series,DataFrame import pandas as pd In [2]: np.random.seed(10) df=pd.DataFrame(np.random.randint(1,5,size=(10, 3)), columns=list('ABC')) df Out [2]: ABC 2 2 1 4 1 2 4 1 2 2 1 2 2 3 1 2 1 3 1 3 1 4 1 1 4 4 3 1 4 3 In [3]: filtered_DF = df[df['B']<2].copy() grouped_DF = filtered_DF.groupby('A') filtered_DF['C_Share_By_Group'] =filtered_DF.C.div(grouped_DF.C.transform("sum")) filtered_DF Out [3]: ABC C_Share_By_Group 4 1 2 0.4 4 1 2 0.4 2 1 2 0.4 2 1 3 0.6 4 1 1 0.2

I want to achieve the same using chaining methods. In R with the dplyr package, I could do something like:

 df %>% filter(B<2) %>% group_by(A) %>% mutate('C_Share_By_Group'=C/sum(C))

The pandas documentation says that mutate in R (dplyr) is equal to assign in pandas, but assign does not work on a grouped object. When I try to assign something to a grouped data framework, I get an error message:

"AttributeError: cannot access the attribute of the called object 'assign' of DataFrameGroupBy objects, try using the 'apply' method

I tried the following, but don’t know how to add a new column, or if it is possible even with a chain:

 (df.loc[df.B<2] .groupby('A') #****WHAT GOES HERE?**** apply(something)? )

+7

python python-2.7 pandas dataframe

Lauh May 10 '16 at 14:50

source share

1 answer

jezrael · Accepted Answer · 2016-05-10T15:06:13+0000

You can try assign :

 print df[df['B']<2].assign(C_Share_By_Group=lambda df: df.C .div(df.groupby('A') .C .transform("sum"))) ABC C_Share_By_Group 1 4 1 2 0.4 2 4 1 2 0.4 3 2 1 2 0.4 5 2 1 3 0.6 7 4 1 1 0.2

Python pandas: add column to grouped DataFrame with method chain

More articles: