Aggregation of pandas groupby objects

I am trying to aggregate some statistics from a groupby object on pieces of data. I need to write data, because there are a lot of (18 million) lines. I want to find the number of rows in each group in each fragment, and then sum them up. I can add groupby objects, but when the group is not present in one member, the result is NaN. See this case:

>>> df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
                       'Y': range(12)})
>>> df
    X   Y
0   A   0
1   B   1
2   C   2
3   A   3
4   B   4
5   C   5
6   B   6
7   C   7
8   D   8
9   B   9
10  C  10
11  D  11
>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
    Y
X    
A NaN
B   4
C   4
D NaN

But I want to see:

>>> df[0:6].groupby(['X']).count() + df[6:].groupby(['X']).count()
    Y
X    
A   2
B   4
C   4
D   2

Is there a good way to do this? Note that in real code, I am looking at a fragmented iterator of a million lines in a group.

+4
source share
1 answer

Call addand pass fill_value=0, which you could iteratively add during chunking, I think:

In [98]:

df = pd.DataFrame({'X': ['A','B','C','A','B','C','B','C','D','B','C','D'],
                       'Y': np.arange(12)})
df[0:6].groupby(['X']).count().add(df[6:].groupby(['X']).count(), fill_value=0)
Out[98]:
   Y
X   
A  2
B  4
C  4
D  2
+2
source

All Articles