Given your small sample data, I used a two-day moving average, not 60 days.
>>> (pd.rolling_mean(data.pivot(columns='Store', index='Date', values='Customers'), window=2) .stack('Store')) Date Store 2013-01-03 1 623.0 2013-01-04 1 598.5 2013-01-05 1 627.0 2013-01-07 1 710.0 dtype: float64
By taking a data bar with dates as your index and saving them as columns, you can simply take a moving average. Then you need to fold the storages to get the data back in the correct form.
The following is an example of outputting source data to the last stack:
Store 1 2 3 Date 2015-07-29 541.5 686.5 767.0 2015-07-30 534.5 664.0 769.5 2015-07-31 550.5 613.0 822.0
After .stack('Store') it will be:
Date Store 2015-07-29 1 541.5 2 686.5 3 767.0 2015-07-30 1 534.5 2 664.0 3 769.5 2015-07-31 1 550.5 2 613.0 3 822.0 dtype: float64
Assuming the above is named df , you can then combine it back into the original data as follows:
data.merge(df.reset_index(), how='left', on=['Date', 'Store'])
EDIT : There is a clear seasonal picture in the data for which you can make adjustments. In any case, you probably want your moving average to be a few seven to represent even weeks. I used the time window of 63 days in the example below (9 weeks).
In order to avoid data loss in the stores that are just opening (and at the beginning of the time period), you can specify min_periods=1 in the moving average tool. This will give you an average of all available observations for the selected time.
df = data.loc[data.Customers > 0, ['Date', 'Store', 'Customers']] result = (pd.rolling_mean(df.pivot(columns='Store', index='Date', values='Customers'), window=63, min_periods=1) .stack('Store')) result.name = 'Customers_63d_mvg_avg' df = df.merge(result.reset_index(), on=['Store', 'Date'], how='left') >>> df.sort_values(['Store', 'Date']).head(8) Date Store Customers Customers_63d_mvg_avg 843212 2013-01-02 1 668 668.000000 842103 2013-01-03 1 578 623.000000 840995 2013-01-04 1 619 621.666667 839888 2013-01-05 1 635 625.000000 838763 2013-01-07 1 785 657.000000 837658 2013-01-08 1 654 656.500000 836553 2013-01-09 1 626 652.142857 835448 2013-01-10 1 615 647.500000
To see more clearly what is happening, here is an example of a toy:
s = pd.Series([1,2,3,4,5] + [np.NaN] * 2 + [6]) >>> pd.concat([s, pd.rolling_mean(s, window=4, min_periods=1)], axis=1) 0 1 0 1 1.0 1 2 1.5 2 3 2.0 3 4 2.5 4 5 3.5 5 NaN 4.0 6 NaN 4.5 7 6 5.5
The window consists of four observations, but note that the final value of 5.5 is (5 + 6) / 2. Values ββ4.0 and 4.5 are (3 + 4 + 5) / 3 and (4 + 5) / 2, respectively.
In our example, the NaN rows of the pivot table are not merged back into df , because we made a left join, and all rows in df have one or more Clients.
You can view the rolling data graph as follows:
df.set_index(['Date', 'Store']).unstack('Store').plot(legend=False)
