Best pythonic way to populate a list containing date type data?

Question

Best pythonic way to populate a list containing date type data?

I have the following list data.

data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15', 6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01', 4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01', 6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15', 9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01', 6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25', 6000.0]]

where the first argument is the date and the second argument is total. I want the result to be used by groups by month and year from the above list.

The result is:

 --> for month: [['JAN',tot1],['FEB',tot2],['MAR',tot3] ...] --> for year: [['2002',tot1],['2005',tot2],['2008',tot3] ...]

+7

python list

Avadhesh Jun 10 '11 at 5:45

source share

5 answers

First, let's convert the data to a more convenient form. We will use the datetime module to process these dates.

 >>> trans = lambda row: (datetime.datetime.strptime(row[0], "%Y-%m-%d"), row[1]) >>> tdata = map(trans, data)

Next, a function (one for two group operations), which sums the value in dict with the corresponding group.

 >>> def mker(left, right): ... result = dict(left) ... mo = right[0].strftime('%b') ... result[mo] = right[1] + left.get(mo, 0) ... return result ... >>> def yker(left, right): ... result = dict(left) ... mo = right[0].strftime('%Y') ... result[mo] = right[1] + left.get(mo, 0) ... return result ...

Finally, we apply these kernel functions to the data using reduce()

 >>> reduce(mker, tdata, {}) {'Apr': 14400.0, 'Aug': 10170.0, 'Dec': 18000.0, 'Feb': 14107.0, 'Jan': 3000.0, 'Jul': 6000.0, 'Jun': 3000.0, 'Mar': 6000.0, 'May': 15960.0, 'Nov': 4170.0, 'Oct': 6263.0, 'Sep': 6000.0} >>> reduce(yker, tdata, {}) {'2002': 12000.0, '2005': 6000.0, '2008': 12000.0, '2009': 15000.0, '2010': 27563.0, '2011': 34507.0}

+3

SingleNegationElimination Jun 10 '11 at 6:08

source share

riffs on Steve's answer:

 >>> data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15', ... 6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01', ... 4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01', ... 6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15', ... 9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01', ... 6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25', ... 6000.0]] >>> monthtotal = defaultdict(float) >>> months = ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL', ... 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'] >>> for s in data: ... monthtotal[months[int(s[0].split('-')[1]) - 1]] += s[1] ... >>> monthtotal defaultdict(<type 'float'>, {'MAR': 6000.0, 'FEB': 14107.0, 'AUG': 10170.0, 'SEP': 6000.0, 'APR': 14400.0, 'JUN': 3000.0, 'JUL': 6000.0, 'JAN': 3000.0, 'MAY': 15960.0, 'NOV': 4170.0, 'DEC': 18000.0, 'OCT': 6263.0})

0

jcomeau_ictx Jun 10 '11 at 6:09

source share

Another solution without collections:

 from datetime import datetime getdate = lambda strd: (datetime.strptime(strd, '%Y-%m-%d').strftime('%Y-%b').split('-')) data = [['2009-01-20', 3000.0], ['2011-03-01', 6000.0], ['2008-12-15', 6000.0], ['2002-02-15', 6000.0], ['2009-04-20', 6000.0], ['2010-08-01', 4170.0], ['2002-07-15', 6000.0], ['2008-08-15', 6000.0], ['2010-12-01', 6000.0], ['2011-02-01', 8107.0], ['2011-04-01', 8400.0], ['2011-05-15', 9000.0], ['2010-05-01', 6960.0], ['2005-12-15', 6000.0], ['2010-10-01', 6263.0], ['2011-06-02', 3000.0], ['2010-11-01', 4170.0], ['2009-09-25', 6000.0]] yeartotal = {} monthtotal = {} for dateVal, total in map(lambda sdata: (getdate(sdata[0]), sdata[1]), data): if dateVal[0] not in yeartotal: yeartotal[dateVal[0]] = 0 if dateVal[1] not in monthtotal: monthtotal[dateVal[1]] = 0 yeartotal[dateVal[0]] += total monthtotal[dateVal[1]] += total

0

Artsiom Rudzenka Jun 10 '11 at 6:10

source share

Here is another solution using numpy.

First we need to change the data so that it looks a bit like a matrix. we will use the default dict over the years as keys and lists of floats as values.

 >>> pre_matrix = collections.defaultdict(lambda:[0]*12) >>> for row in tdata: ... pre_matrix[row[0].year][row[0].month - 1] += row[1] ...

Since we don’t want the array containing each year with Common Era, it allows us to examine pre-formatted data and extract the minimum and maximum years.

 >>> r = range(min(pre_matrix.keys()),1+max(pre_matrix.keys()))

Finally, build a matrix, each row of which contains data for one year.

 >>> matrix = numpy.array([pre_matrix[y] for y in r])

From there, just get the sum of rows and columns. we will use zip() to return interesting date values.

 >>> zip((datetime.datetime(1970, i+1, 1).strftime("%b"), s) for i, s in enumerate(matrix.sum(0))) [(('Jan', 3000.0),), (('Feb', 14107.0),), (('Mar', 6000.0),), (('Apr', 14400.0),), (('May', 15960.0),), (('Jun', 3000.0),), (('Jul', 6000.0),), (('Aug', 10170.0),), (('Sep', 6000.0),), (('Oct', 6263.0),), (('Nov', 4170.0),), (('Dec', 18000.0),)]

Since we do not need to localize the years, this is a little easier.

 >>> list(zip(r, matrix.sum(1))) [(2002, 12000.0), (2003, 0.0), (2004, 0.0), (2005, 6000.0), (2006, 0.0), (2007, 0.0), (2008, 12000.0), (2009, 15000.0), (2010, 27563.0), (2011, 34507.0)]

0

SingleNegationElimination Jun 10 '11 at 6:36

source share

Steve tjoa · Accepted Answer · 2011-06-10T05:53:04+0000

 from collections import defaultdict yeartotal = defaultdict(float) monthtotal = defaultdict(float) for s in data: d = s[0].split('-') yeartotal[d[0]] += s[1] monthtotal[d[1]] += s[1] In [37]: [item for item in yeartotal.iteritems()] Out[37]: [('2002', 12000.0), ('2005', 6000.0), ('2008', 12000.0), ('2009', 15000.0), ('2011', 34507.0), ('2010', 27563.0)] In [38]: [item for item in monthtotal.iteritems()] Out[38]: [('02', 14107.0), ('03', 6000.0), ('12', 18000.0), ('06', 3000.0), ('07', 6000.0), ('04', 14400.0), ('05', 15960.0), ('08', 10170.0), ('09', 6000.0), ('01', 3000.0), ('11', 4170.0), ('10', 6263.0)]

Best pythonic way to populate a list containing date type data?

More articles: