I have a pandas DataFrame with a TIMESTAMP column that is of datetime64 data type. Please keep in mind this column is not initially set as an index; the index is just regular integers, and the first few lines look like this:
TIMESTAMP TYPE 0 2014-07-25 11:50:30.640 2 1 2014-07-25 11:50:46.160 3 2 2014-07-25 11:50:57.370 2
For each day, there is an arbitrary number of entries, and there may be days without data. What I'm trying to get is the average number of daily entries per month , then draw it as a bar graph with months on the x axis (April 2014, May 2014 ... etc.). I was able to calculate these values ​​using the code below
dfWIM.index = dfWIM.TIMESTAMP for i in range(dfWIM.TIMESTAMP.dt.year.min(),dfWIM.TIMESTAMP.dt.year.max()+1): for j in range(1,13): print dfWIM[(dfWIM.TIMESTAMP.dt.year == i) & (dfWIM.TIMESTAMP.dt.month == j)].resample('D', how='count').TIMESTAMP.mean()
which gives the following result:
nan nan 3100.14285714 6746.7037037 9716.42857143 10318.5806452 9395.56666667 9883.64516129 8766.03225806 9297.78571429 10039.6774194 nan nan nan
This is normal as it is, and with some extra work, I can compare the results to fix the names of the months, and then build a graph. However, I'm not sure if this is the right / best way, and I suspect there might be an easier way to get results using Pandas.
I would be happy to hear what you think. Thanks!
NOTE: If I do not set the TIMESTAMP column as an index, I get an error "abbreviation" that is not allowed for this dtype error.