Group arbitrary date objects that are within the time range from each other

I want to split the calendar into two-week intervals, starting from 2008-May-5 or any arbitrary starting point.

So, I start with a few date objects:

 import datetime as DT raw = ("2010-08-01", "2010-06-25", "2010-07-01", "2010-07-08") transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d").date(), "Some data here") for datestring in raw] transactions.sort() 

By manually analyzing the dates, I can quite figure out which dates belong to the same interval between two weeks. I want the grouping to look like this:

 # Fortnight interval 1 (datetime.date(2010, 6, 25), 'Some data here') (datetime.date(2010, 7, 1), 'Some data here') (datetime.date(2010, 7, 8), 'Some data here') # Fortnight interval 2 (datetime.date(2010, 8, 1), 'Some data here') 
+4
source share
3 answers
 import datetime as DT import itertools start_date=DT.date(2008,5,5) def mkdate(datestring): return DT.datetime.strptime(datestring, "%Y-%m-%d").date() def fortnight(date): return (date-start_date).days //14 raw = ("2010-08-01", "2010-06-25", "2010-07-01", "2010-07-08") transactions=[(date,"Some data") for date in map(mkdate,raw)] transactions.sort(key=lambda (date,data):date) for key,grp in itertools.groupby(transactions,key=lambda (date,data):fortnight(date)): print(key,list(grp)) 

gives

 # (55, [(datetime.date(2010, 6, 25), 'Some data')]) # (56, [(datetime.date(2010, 7, 1), 'Some data'), (datetime.date(2010, 7, 8), 'Some data')]) # (58, [(datetime.date(2010, 8, 1), 'Some data')]) 

Please note that 2010-6-25 is in the 55th week from 2008 to 5-5, and 2010-7-1 is in the 56th. If you want them to be grouped together, just change start_date (to something like 2008-5-16).

PS. The key tool used above is itertools.groupby , which is explained in detail here .

Edit: lambda is just a way to make "anonymous" functions . (They are anonymous in the sense that they are not assigned names, such as functions defined by def ). Wherever you see lambda, you can also use def to create an equivalent function. For example, you can do this:

 import operator transactions.sort(key=operator.itemgetter(0)) def transaction_fortnight(transaction): date,data=transaction return fortnight(date) for key,grp in itertools.groupby(transactions,key=transaction_fortnight): print(key,list(grp)) 
+11
source

Use itertools groupby with lambda function to divide the distance from the starting point by the length of the period.

 >>> for i, group in groupby(range(30), lambda x: x // 7): print list(group) [0, 1, 2, 3, 4, 5, 6] [7, 8, 9, 10, 11, 12, 13] [14, 15, 16, 17, 18, 19, 20] [21, 22, 23, 24, 25, 26, 27] [28, 29] 

So, with dates:

 import itertools as it start = DT.date(2008,5,5) lenperiod = 14 for fnight,info in it.groupby(transactions,lambda data: (data[0]-start).days // lenperiod): print list(info) 

You can also use weeknumbers from strftime and lenperiod in the number of weeks:

 for fnight,info in it.groupby(transactions,lambda data: int (data[0].strftime('%W')) // lenperiod): print list(info) 
+4
source

Using pandas DataFrame with resample works too. OP data, but change "some data here" to "abcd".

 >>> import datetime as DT >>> raw = ("2010-08-01", ... "2010-06-25", ... "2010-07-01", ... "2010-07-08") >>> transactions = [(DT.datetime.strptime(datestring, "%Y-%m-%d"), data) for ... datestring, data in zip(raw,'abcd')] [(datetime.datetime(2010, 8, 1, 0, 0), 'a'), (datetime.datetime(2010, 6, 25, 0, 0), 'b'), (datetime.datetime(2010, 7, 1, 0, 0), 'c'), (datetime.datetime(2010, 7, 8, 0, 0), 'd')] 

Now try using pandas. First create a DataFrame by naming the columns and setting the indexes on the dates.

 >>> import pandas as pd >>> df = pd.DataFrame(transactions, ... columns=['date','data']).set_index('date') data date 2010-08-01 a 2010-06-25 b 2010-07-01 c 2010-07-08 d 

Now use Offset Aliases every 2 weeks starting on Sunday and combine the results.

 >>> fortnight = df.resample('2W-SUN').sum() data date 2010-06-27 b 2010-07-11 cd 2010-07-25 0 2010-08-08 a 

Now check the data as needed in a weekly start

 >>> fortnight.loc['2010-06-27']['data'] b 

or index

 >>> fortnight.iloc[0]['data'] b 

or indices

 >>> data = fortnight.iloc[:2]['data'] b date 2010-06-27 b 2010-07-11 cd Freq: 2W-SUN, Name: data, dtype: object >>> data[0] b >>> data[1] cd 
+1
source

All Articles