First let's say I'm new to pandas.
I am trying to create a new column in a DataFrame. I can do this as shown in my example. But I want to do this using chaining methods, so I don't need to assign new variables. Let me first show what I want to achieve, and what I have done so far:
In [1]: import numpy as np from pandas import Series,DataFrame import pandas as pd In [2]: np.random.seed(10) df=pd.DataFrame(np.random.randint(1,5,size=(10, 3)), columns=list('ABC')) df Out [2]: ABC 2 2 1 4 1 2 4 1 2 2 1 2 2 3 1 2 1 3 1 3 1 4 1 1 4 4 3 1 4 3 In [3]: filtered_DF = df[df['B']<2].copy() grouped_DF = filtered_DF.groupby('A') filtered_DF['C_Share_By_Group'] =filtered_DF.C.div(grouped_DF.C.transform("sum")) filtered_DF Out [3]: ABC C_Share_By_Group 4 1 2 0.4 4 1 2 0.4 2 1 2 0.4 2 1 3 0.6 4 1 1 0.2
I want to achieve the same using chaining methods. In R with the dplyr package, I could do something like:
df %>% filter(B<2) %>% group_by(A) %>% mutate('C_Share_By_Group'=C/sum(C))
The pandas documentation says that mutate in R (dplyr) is equal to assign in pandas, but assign does not work on a grouped object. When I try to assign something to a grouped data framework, I get an error message:
"AttributeError: cannot access the attribute of the called object 'assign' of DataFrameGroupBy objects, try using the 'apply' method
I tried the following, but donβt know how to add a new column, or if it is possible even with a chain:
(df.loc[df.B<2] .groupby('A')
Lauh
source share