My first SO question: I am confused by this behavior of the groupby method in pandas (0.12.0-4), it seems to apply the TWICE function to the first line of the data frame. For example:
>>> from pandas import Series, DataFrame >>> import pandas as pd >>> df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]}) >>> print(df) class count 0 A 1 1 B 0 2 C 2
First I will check that the groupby function is working fine and it seems to be fine:
>>> for group in df.groupby('class', group_keys = True): >>> print(group) ('A', class count 0 A 1) ('B', class count 1 B 0) ('C', class count 2 C 2)
Then I try to do something like this using apply on the groupby object, and I get the first line output twice:
>>> def checkit(group): >>> print(group) >>> df.groupby('class', group_keys = True).apply(checkit) class count 0 A 1 class count 0 A 1 class count 1 B 0 class count 2 C 2
Any help would be appreciated! Thank.
Edit: @Jeff provides the answer below. I was tight and didn’t understand right away, so here is a simple example to show that, despite the double listing of the first group in the above example, the apply method works only once in the first group and does not mutate the original data frame
>>> def addone(group): >>> group['count'] += 1 >>> return group >>> df.groupby('class', group_keys = True).apply(addone) >>> print(df) class count 0 A 1 1 B 0 2 C 2
But by assigning a method return to a new object, we see that it works as expected:
df2 = df.groupby ('class', group_keys = True) .apply (addone) print (df2)
class count 0 A 2 1 B 1 2 C 3