Pandas Dataframe Weekly Separation
I have a framework with columns "Date" and "Number".
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
I need to take a photoframe so that I can isolate each week separately. The ultimate goal is to look at the MAX "Nums" value for each week and compare it with the LAST value next week to find out what the percentage change is. For instance:
week1 = df[0:5]
week2 = df[5:9]
week3 = df[9:12]
In [156]: w1max = week1.Nums.max()
Out[156]: 97
In [157]: w2Last = week2.iloc[-1].Nums
Out[157]: 76
pctChange = (w2Last-w1max)/float(w1max)
In [166]: pctChange
Out[166]: -0.21649484536082475
The problem is that there are no days for several weeks (for example, week 2 is missing on Monday and week 3 is missing on Friday). So how do I isolate them?
The closest thing seems to be using df.resample(), but I don't know how to make the comparison I'm trying to do with this.
import numpy as np
import pandas as pd
np.random.seed(2016)
dates = pd.date_range('1/1/2001','1/1/2003', freq = 'd')
nums = [np.random.randint(100) for x in range(len(dates))]
df = pd.DataFrame({'Dates': dates, 'DOW': dates.strftime('%a'), 'Nums': nums})
df = df[(df.DOW != 'Sat') & (df.DOW !='Sun')]
df = df.drop([7,18]).reset_index(drop = True)
df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
df2['previous_max'] = df2['max'].shift(1)
df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']
print(df2.head())
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667
df.groupby pd.Grouper object .
agg max last Nums :
In [163]: df2 = df.groupby(pd.Grouper(freq='W', key='Dates'))['Nums'].agg(['max','last'])
In [164]: df2.head()
Out[164]:
max last
Dates
2001-01-07 83 39
2001-01-14 75 75
2001-01-21 97 18
2001-01-28 72 37
2001-02-04 84 24
shift(1), max :
In [165]: df2['previous_max'] = df2['max'].shift(1); df2.head()
Out[165]:
max last previous_max
Dates
2001-01-07 83 39 NaN
2001-01-14 75 75 83.0
2001-01-21 97 18 75.0
2001-01-28 72 37 97.0
2001-02-04 84 24 72.0
:
In [166]: df2['change'] = (df2['last']-df2['previous_max'])/df2['previous_max']; df2.head()
Out[166]:
max last previous_max change
Dates
2001-01-07 83 39 NaN NaN
2001-01-14 75 75 83.0 -0.096386
2001-01-21 97 18 75.0 -0.760000
2001-01-28 72 37 97.0 -0.618557
2001-02-04 84 24 72.0 -0.666667