Counting entries in a dictionary list: to determine the correspondence of a cycle and a list using a map (itemgetter)

In the Python program that I am writing, I compared using the for and increment variables compared to comprehending the list with map(itemgetter) and len() when counting entries in dictionaries that are in the list. It uses the same time using each method. Am I doing something wrong or is there a better approach?

Here is a greatly simplified and shortened data structure:

 list = [ {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'biscuits and gravy'}, {'key1': False, 'dontcare': False, 'ignoreme': False, 'key2': True, 'filenotfound': 'peaches and cream'}, {'key1': True, 'dontcare': False, 'ignoreme': False, 'key2': False, 'filenotfound': 'Abbott and Costello'}, {'key1': False, 'dontcare': False, 'ignoreme': True, 'key2': False, 'filenotfound': 'over and under'}, {'key1': True, 'dontcare': True, 'ignoreme': False, 'key2': True, 'filenotfound': 'Scotch and... well... neat, thanks'} ] 

Here is the for loop version:

 #!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True key1 = key2 = 0 for dictionary in list: if dictionary["key1"]: key1 += 1 if dictionary["key2"]: key2 += 1 print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2) 

The output for the data above is:

 Counts: key1: 3, subset key2: 2 

Here is another, possibly more Pythonic, version:

 #!/usr/bin/env python # Python 2.6 # count the entries where key1 is True # keep a separate count for the subset that also have key2 True from operator import itemgetter KEY1 = 0 KEY2 = 1 getentries = itemgetter("key1", "key2") entries = map(getentries, list) key1 = len([x for x in entries if x[KEY1]]) key2 = len([x for x in entries if x[KEY1] and x[KEY2]]) print "Counts: key1: " + str(key1) + ", subset key2: " + str(key2) 

The output for the data above (same as before):

 Counts: key1: 3, subset key2: 2 

I am a little surprised that they take the same amount of time. I wonder if there is anything faster. I'm sure I'm missing something simple.

One of the options that I examined is loading data into a database and executing SQL queries, but the data should not be saved, and I will need to profile the overhead of data transfer, etc., and the database may not always be to be available.

I have no control over the original data form.

The code above is not suitable for style points.

+4
source share
1 answer

I think that you are measuring incorrectly, taking away the code that needs to be measured in a lot of overhead costs (executed at the top level of the module, and not in the function, making a conclusion). Putting two fragments in a function named forloop and withmap and adding * 100 to the list definition (after closing ] ) to make the measurement a little significant, I see on my slow laptop:

 $ py26 -mtimeit -s'import co' 'co.forloop()' 10000 loops, best of 3: 202 usec per loop $ py26 -mtimeit -s'import co' 'co.withmap()' 10 loops, best of 3: 601 usec per loop 

those. the supposedly "more pythonic" approach with map is three times slower than the simple for approach, which tells you that it is actually not "more pythonic"; -).

The good Python sign is simplicity, which for me recommends what I called hubris-ly ...:

 def thebest(): entries = [d['key2'] for d in list if d['key1']] return len(entries), sum(entries) 

which, when measured, saves 10% to 20% of the time compared to the forloop approach.

+12
source

Source: https://habr.com/ru/post/1312046/


All Articles