Effective pandas moving aggregation over a date range by group - Python 2.7 Windows - pandas 0.19.2

I am trying to find an effective way to generate moving numbers or amounts in pandas, given grouping and date range. In the end, I want to be able to add conditions, i.e. evaluating a type field, but I'm not there yet. I wrote something to do the job, but I feel that there may be a more direct way to achieve the desired result.

My pandas data frame currently looks like this, with the desired output being placed in the last rolling_sales_180 column.

name date amount rolling_sales_180 0 David 2015-01-01 100 100.0 1 David 2015-01-05 500 600.0 2 David 2015-05-30 50 650.0 3 David 2015-07-25 50 100.0 4 Ryan 2014-01-04 100 100.0 5 Ryan 2015-01-19 500 500.0 6 Ryan 2016-03-31 50 50.0 7 Joe 2015-07-01 100 100.0 8 Joe 2015-09-09 500 600.0 9 Joe 2015-10-15 50 650.0 

My current solution and environment can be found below. I modeled my solution from this R & A on stackoverflow. Effective way to run subtotal for last 365 days

 import pandas as pd import numpy as np def trans_date_to_dist_matrix(date_col): # used to create a distance matrix x = date_col.tolist() y = date_col.tolist() data = [] for i in x: tmp = [] for j in y: tmp.append(abs((i - j).days)) data.append(tmp) del tmp return pd.DataFrame(data=data, index=date_col.values, columns=date_col.values) def lower_tri(x_col, date_col, win): # x_col = column user wants a rolling sum of ,date_col = dates, win = time window dm = trans_date_to_dist_matrix(date_col=date_col) # dm = distance matrix dm = dm.where(dm <= win) # find all elements of the distance matrix that are less than window(time) lt = dm.where(np.tril(np.ones(dm.shape)).astype(np.bool)) # lt = lower tri of distance matrix so we get only future dates lt[lt >= 0.0] = 1.0 # cleans up our lower tri so that we can sum events that happen on the day we are evaluating lt = lt.fillna(0) # replaces NaN with 0 for multiplication return pd.DataFrame(x_col.values * lt.values).sum(axis=1).tolist() def flatten(x): try: n = [v for sl in x for v in sl] return [v for sl in n for v in sl] except: return [v for sl in x for v in sl] data = [ ['David', '1/1/2015', 100], ['David', '1/5/2015', 500], ['David', '5/30/2015', 50], ['David', '7/25/2015', 50], ['Ryan', '1/4/2014', 100], ['Ryan', '1/19/2015', 500], ['Ryan', '3/31/2016', 50], ['Joe', '7/1/2015', 100], ['Joe', '9/9/2015', 500], ['Joe', '10/15/2015', 50] ] list_of_vals = [] dates_df = pd.DataFrame(data=data, columns=['name', 'date', 'amount'], index=None) dates_df['date'] = pd.to_datetime(dates_df['date']) list_of_vals.append(dates_df.groupby('name', as_index=False).apply( lambda x: lower_tri(x_col=x.amount, date_col=x.date, win=180))) new_data = flatten(list_of_vals) dates_df['rolling_sales_180'] = new_data print dates_df 

Your time and feedback are welcome.

+11
python numpy pandas
source share
1 answer

Pandas has time- rolling support through rolling , so you can use this instead of writing your own solution from scratch:

 def get_rolling_amount(grp, freq): return grp.rolling(freq, on='date')['amount'].sum() df['rolling_sales_180'] = df.groupby('name', as_index=False, group_keys=False) \ .apply(get_rolling_amount, '180D') 

Result:

  name date amount rolling_sales_180 0 David 2015-01-01 100 100.0 1 David 2015-01-05 500 600.0 2 David 2015-05-30 50 650.0 3 David 2015-07-25 50 100.0 4 Ryan 2014-01-04 100 100.0 5 Ryan 2015-01-19 500 500.0 6 Ryan 2016-03-31 50 50.0 7 Joe 2015-07-01 100 100.0 8 Joe 2015-09-09 500 600.0 9 Joe 2015-10-15 50 650.0 
+9
source share

All Articles