In the Python program that I am writing, I compared using the for and increment variables compared to comprehending the list with map(itemgetter) and len() when counting entries in dictionaries that are in the list. It uses the same time using each method. Am I doing something wrong or is there a better approach?
Here is a greatly simplified and shortened data structure:
list = [ {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'}, {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'}, {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'}, {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'}, {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'} ]
Here is the for loop version:
#!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True key1 = key2 = 0 for dictionary in list: if dictionary["key1"]: key1 += 1 if dictionary["key2"]: key2 += 1 print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)
The output for the data above is:
Counts: key1: 3, subset key2: 2
Here is another, possibly more Pythonic, version:
#!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True from operator import itemgetter KEY1 = 0 KEY2 = 1 getentries = itemgetter("key1", "key2") entries = map(getentries, list) key1 = len([x for x in entries if x[KEY1]]) key2 = len([x for x in entries if x[KEY1] and x[KEY2]]) print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2)
The output for the data above (same as before):
Counts: key1: 3, subset key2: 2
I am a little surprised that they take the same amount of time. I wonder if there is anything faster. I'm sure I'm missing something simple.
One of the options that I examined is loading data into a database and executing SQL queries, but the data should not be saved, and I will need to profile the overhead of data transfer, etc., and the database may not always be to be available.
I have no control over the original data form.
The code above is not suitable for style points.