How to group and count rows by month and year using Pandas?

I have a data set with personal data such as name, height, weight and date of birth. I would build a graph with the number of people born in a particular month and year. I use python pandas to accomplish this, and my strategy was to try to group by year and month and add a counter. But the closest I got is to count the number of people by year or month, but not both.

df['birthdate'].groupby(df.birthdate.dt.year).agg('count') 

Other questions in stackoverflow point to a group called TimeGrouper, but a search in the pandas documentation did not find anything. Any idea?

+12
source share
5 answers

To group by multiple criteria, pass a list of columns or criteria:

 df['birthdate'].groupby([df.birthdate.dt.year, df.birthdate.dt.month]).agg('count') 

Example:

 In [165]: df = pd.DataFrame({'birthdate':pd.date_range(start=dt.datetime(2015,12,20),end=dt.datetime(2016,3,1))}) df.groupby([df['birthdate'].dt.year, df['birthdate'].dt.month]).agg({'count'}) Out[165]: birthdate count birthdate birthdate 2015 12 12 2016 1 31 2 29 3 1 

UPDATE

Starting with version 0.23.0 , the above code no longer works due to the restriction that the names of 0.23.0 levels must be unique, now you need rename levels for this to work:

 In[107]: df.groupby([df['birthdate'].dt.year.rename('year'), df['birthdate'].dt.month.rename('month')]).agg({'count'}) Out[107]: birthdate count year month 2015 12 12 2016 1 31 2 29 3 1 
+25
source

You can also use the "month" period with to_period with to_period dt :

 In [11]: df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')}) In [12]: df['birthdate'].groupby(df.birthdate.dt.to_period("M")).agg('count') Out[12]: birthdate 2015-12 12 2016-01 31 2016-02 29 2016-03 1 Freq: M, Name: birthdate, dtype: int64 

It is worth noting that if datetime is an index (not a column), you can use resample :

 df.resample("M").count() 
+11
source

Another solution is to set birthdate as an index and reselect:

 import pandas as pd df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')}) df.set_index('birthdate').resample('MS').size() 

Output:

 birthdate 2015-12-01 12 2016-01-01 31 2016-02-01 29 2016-03-01 1 Freq: MS, dtype: int64 
+9
source

As of April 2019: this will work. Version for pandas - 0.24.x

df.groupby([df.dates.dt.year.rename('year'), df.dates.dt.month.rename('month')]).size()

0
source

Replace the date and quantity fields with the appropriate column names. This code fragment will group, summarize and sort based on the given parameters. You can also change the frequency to 1M or 2M and so on ...

 df[['date', 'count']].groupby(pd.Grouper(key='date', freq='1M')).sum().sort_values(by='date', ascending=True)['count'] 
0
source

All Articles