Python pandas groupby () result

I have the following python pandas data frame:

df = pd.DataFrame( { 'A': [1,1,1,1,2,2,2,3,3,4,4,4], 'B': [5,5,6,7,5,6,6,7,7,6,7,7], 'C': [1,1,1,1,1,1,1,1,1,1,1,1] } ); df ABC 0 1 5 1 1 1 5 1 2 1 6 1 3 1 7 1 4 2 5 1 5 2 6 1 6 2 6 1 7 3 7 1 8 3 7 1 9 4 6 1 10 4 7 1 11 4 7 1 

I would like to have another column storing the sum value over the C values ​​for fixed (both) A and B. That is, something like:

  ABCD 0 1 5 1 2 1 1 5 1 2 2 1 6 1 1 3 1 7 1 1 4 2 5 1 1 5 2 6 1 2 6 2 6 1 2 7 3 7 1 2 8 3 7 1 2 9 4 6 1 1 10 4 7 1 2 11 4 7 1 2 

I tried with pandas groupby and it kind of works:

 res = {} for a, group_by_A in df.groupby('A'): group_by_B = group_by_A.groupby('B', as_index = False) res[a] = group_by_B['C'].sum() 

but I don’t know how to “get” the results from res to df ordered manner. I would be very pleased with any advice on this. Thank.

+15
python pandas group-by
Jul 16 '13 at 0:24
source share
3 answers

Here is one way (although he feels that this should work at the same time as the application, I cannot get it).

 In [11]: g = df.groupby(['A', 'B']) In [12]: df1 = df.set_index(['A', 'B']) 

The size groupby function is the one you want, we must map it to "A" and "B" as an index:

 In [13]: df1['D'] = g.size() # unfortunately this doesn't play nice with as_index=False # Same would work with g['C'].sum() In [14]: df1.reset_index() Out[14]: ABCD 0 1 5 1 2 1 1 5 1 2 2 1 6 1 1 3 1 7 1 1 4 2 5 1 1 5 2 6 1 2 6 2 6 1 2 7 3 7 1 2 8 3 7 1 2 9 4 6 1 1 10 4 7 1 2 11 4 7 1 2 
+13
Jul 16 '13 at 0:48
source share

You can also make one liner using a merge as follows:

 df = df.merge(pd.DataFrame({'D':df.groupby(['A', 'B'])['C'].size()}), left_on=['A', 'B'], right_index=True) 
+5
Aug 20 '13 at 17:55
source share

You can also make one liner using the transform applied to groupby:

 df['D'] = df.groupby(['A','B'])['C'].transform('sum') 
+4
Dec 30 '15 at 2:54
source share



All Articles