How can I select date ranges and categories in pandas?

I have a data frame with a date, category and value. I would like to calculate the total values ​​for each category. For example, I want to summarize the values ​​that occur in a 3-day period, but for each category separately.

An attempt that seems too complicated

import random
import datetime as dt
import pandas as pd
random.seed(0)

df=pd.DataFrame([[dt.datetime(2000,1,random.randint(1,31)), random.choice("abc"), random.randint(1,3)] for _ in range(100)], columns=["date", "cat", "value"])
df.set_index("date", inplace=True)

result=df.groupby("cat").resample("3d", how="sum").unstack("cat").value.fillna(0)
result.plot()

This is basically correct logic, but re-sampling does not have a fixed start, so the date ranges for three-day periods do not align between categories (and I get NaN / 0 values).

What is the best way to achieve this plot?

+4
source share
1 answer

I think you should group catand date:

df = pd.DataFrame([[dt.datetime(2000,1,random.randint(1,31)), random.choice("abc"), random.randint(1,3)] for _ in range(100)], columns=["date", "cat", "value"])
df.groupby(["cat", pd.Grouper(freq='3d',key='date')]).sum().unstack(0).fillna(0).plot()
+6
source

All Articles