Python: group timeline results

Question

Python: group timeline results

I have big data loaded from a pickle file. The data is a sorted list of tuples containing datetime and int like this

[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L), (datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L), (datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L), etc ]

I want to get population density based on some time intervals. For example, I want to capture the number of records in 5 minutes / 1 minute / 30 seconds.

What is the best way to do this? I know that I can just iterate over all instances in the list, but I was looking for a better approach (if one exists).

The desired output would be something like this:

 2010-01-01 04:10:00 --- 5000 2010-02-04 10:05:00 --- 4000 2010-01-02 13:25:00 --- 3999

+6

python

sberry Feb 26 '10 at 20:33

source share

2 answers

Check out itertools.groupby . You can pass a function that calculates the correct bucket as a key. Then you can run your aggregations (calculations, averages, what-you-you) in groups as a result of iteration.

+5

Hank gay Feb 26 '10 at 20:35

source share

unutbu · Accepted Answer · 2010-02-26T21:00:32+0000

bisect.bisect is another way to solve this problem:

 import datetime import bisect import collections data=[ (datetime.datetime(2010, 2, 26, 12, 8, 17), 5594813L), (datetime.datetime(2010, 2, 26, 12, 7, 31), 5594810L), (datetime.datetime(2010, 2, 26, 12, 6, 4) , 5594807L), ] interval=datetime.timedelta(minutes=1,seconds=30) start=datetime.datetime(2010, 2, 26, 12, 6, 4) grid=[start+n*interval for n in range(10)] bins=collections.defaultdict(list) for date,num in data: idx=bisect.bisect(grid,date) bins[idx].append(num) for idx,nums in bins.iteritems(): print('{0} --- {1}'.format(grid[idx],len(nums)))

Python: group timeline results

More articles: