Average daily number of records per month in Pandas DataFrame

Question

Average daily number of records per month in Pandas DataFrame

I have a pandas DataFrame with a TIMESTAMP column that is of datetime64 data type. Please keep in mind this column is not initially set as an index; the index is just regular integers, and the first few lines look like this:

  TIMESTAMP TYPE 0 2014-07-25 11:50:30.640 2 1 2014-07-25 11:50:46.160 3 2 2014-07-25 11:50:57.370 2

For each day, there is an arbitrary number of entries, and there may be days without data. What I'm trying to get is the average number of daily entries per month , then draw it as a bar graph with months on the x axis (April 2014, May 2014 ... etc.). I was able to calculate these values using the code below

 dfWIM.index = dfWIM.TIMESTAMP for i in range(dfWIM.TIMESTAMP.dt.year.min(),dfWIM.TIMESTAMP.dt.year.max()+1): for j in range(1,13): print dfWIM[(dfWIM.TIMESTAMP.dt.year == i) & (dfWIM.TIMESTAMP.dt.month == j)].resample('D', how='count').TIMESTAMP.mean()

which gives the following result:

 nan nan 3100.14285714 6746.7037037 9716.42857143 10318.5806452 9395.56666667 9883.64516129 8766.03225806 9297.78571429 10039.6774194 nan nan nan

This is normal as it is, and with some extra work, I can compare the results to fix the names of the months, and then build a graph. However, I'm not sure if this is the right / best way, and I suspect there might be an easier way to get results using Pandas.

I would be happy to hear what you think. Thanks!

NOTE: If I do not set the TIMESTAMP column as an index, I get an error "abbreviation" that is not allowed for this dtype error.

+6

python pandas timestamp time-series

marillion Oct 26 '15 at 16:06

source share

1 answer

jakevdp · Accepted Answer · 2015-10-26T17:52:10+0000

I think you need to do two rounds of groupby , first group by day and count the instances, and next to the group by month and calculate the average value of the daily account. You could do something like this.

First, I will create some fake data that looks like yours:

 import pandas as pd # make 1000 random times throughout the year N = 1000 times = pd.date_range('2014', '2015', freq='min') ind = np.random.permutation(np.arange(len(times)))[:N] data = pd.DataFrame({'TIMESTAMP': times[ind], 'TYPE': np.random.randint(0, 10, N)}) data.head()

Now I will do two groupbys using pd.TimeGrouper and pd.TimeGrouper monthly averages:

 import seaborn as sns # for nice plot styles (optional) daily = data.set_index('TIMESTAMP').groupby(pd.TimeGrouper(freq='D'))['TYPE'].count() monthly = daily.groupby(pd.TimeGrouper(freq='M')).mean() ax = monthly.plot(kind='bar')

Formatting along the x axis is poor, but you can customize it if necessary.

Average daily number of records per month in Pandas DataFrame

More articles: