How to aggregate tuple list items if tuples have the same first item?

I have a list in which each value is a list of tuples. for example, this is the value I am retrieving for the key:

[('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1) ] 

I also sorted the list. now I want to aggregate the values ​​as follows:

  [('1998-01' , 12 ) , ('1998-06' ,8 ) , ('1999-07',8 )] 

in a way, I want to group my tuples in terms of a month, to summarize the ints for each month together, I read about groupby, and I think this cannot help me with my data structure, because I have no idea that I’ll come across on my list, so I’m trying to find a way to say: start with the first elements of the tuples if I [0] [: 6] are equal: sum i [1]. but it’s hard for me to fulfill this idea.

  for i in List : if i[0][:6] # *problem* I don't know how to say my condition : s=sum(i[1]) #? 

Any advice would be appreciated since I am a new python user!

+7
python list aggregate
source share
4 answers

Another answer different from the ones given. You can simply create a new dictionary where the keys are combinations of year and month. Looping over the dates in your list + using dictionary.get(key, defaultvalue) should do the trick. IT adds the current value to the value in the new dictionary, if the key does not already exist, it returns the default value of 0 and creates the key.

 data = [('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1)] dictionary = dict() for (mydate, val) in data: # ym = mydate[0:7] # the key is only the year month combination (ie '1998-01' for example) dictionary[ym] = dictionary.get(ym, 0) + val # return the value for that key or return default 0 (and create key) data_aggregated = [(key, val) for (key, val) in dictionary.iteritems()] # if you need it back in old format 
+1
source share

Try using itertools.groupby to aggregate values ​​by month:

 from itertools import groupby a = [('1998-01-20', 8), ('1998-01-22', 4), ('1998-06-18', 8), ('1999-07-15', 7), ('1999-07-21', 1)] for key, group in groupby(a, key=lambda x: x[0][:7]): print key, sum(j for i, j in group) # Output 1998-01 12 1998-06 8 1999-07 8 

Here's a single line option:

 print [(key, sum(j for i, j in group)) for key, group in groupby(a, key=lambda x: x[0][:7])] # Output [('1998-01', 12), ('1998-06', 8), ('1999-07', 8)] 
+10
source share

Just use defaultdict :

 from collections import defaultdict DATA = [ ('1998-01-20', 8), ('1998-01-22', 4), ('1998-06-18', 8), ('1999-07-15', 7), ('1999-07-21', 1), ] groups = defaultdict(int) for date, value in DATA: groups[date[:7]] += value from pprint import pprint pprint(groups) 
+3
source share

I like to use defaultdict to count:

 from collections import defaultdict lst = [('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1)] result = defaultdict(int) for date, cnt in lst: year, month, day = date.split('-') result['-'.join([year, month])] += cnt print(result) 
0
source share

All Articles