Check if there is a dict or try / except that has the best performance in python?

I have several dictionaries containing similar data.

Most queries will be resolved with a single search for a single dictionary.

So, is it better to work with preliminary checks for a key in a dict and try in the next dict if an exception key is thrown?

Or maybe something like

# d1, d2, d3 = bunch of dictionaries value = d1.get(key, d2.get(key, d3.get(key, 0))) 

?

+4
source share
6 answers

Depends on keys in dictionaries.

If you confidently predict that keys are most often used, then use get.

If you confidently predict that keys are most often used, use try except.

+4
source

It seems that in almost all cases, using get will be faster. Here is my test run using try..except and get

 >>> def foo1(n): spam = dict(zip(range(-99,100,n),[1]*200)) s = 0 for e in range(1,100): try: s += spam[e] except KeyError: try: s += spam[-e] except KeyError: s += 0 return s >>> def foo2(n): spam = dict(zip(range(-99,100,n),[1]*200)) s = 0 for e in range(1,100): s += spam.get(e, spam.get(-e,0)) return s >>> for i in range(1,201,10): res1 = timeit.timeit('foo1({})'.format(i), setup = "from __main__ import foo1", number=1000) res2 = timeit.timeit('foo2({})'.format(i), setup = "from __main__ import foo2", number=1000) print "{:^5}{:10.5}{:10.5}{:^10}{:^10}".format(i,res1,res2,foo1(i),foo2(i)) 1 0.075102 0.082862 99 99 11 0.25096 0.054272 9 9 21 0.2885 0.051398 10 10 31 0.26211 0.060171 7 7 41 0.26653 0.053595 5 5 51 0.2609 0.052511 4 4 61 0.2686 0.052792 4 4 71 0.26645 0.049901 3 3 81 0.26351 0.051275 3 3 91 0.26939 0.051192 3 3 101 0.264 0.049924 2 2 111 0.2648 0.049875 2 2 121 0.26644 0.049151 2 2 131 0.26417 0.048806 2 2 141 0.26418 0.050543 2 2 151 0.26585 0.049787 2 2 161 0.26663 0.051136 2 2 171 0.26549 0.048601 2 2 181 0.26425 0.050964 2 2 191 0.2648 0.048734 2 2 >>> 
+4
source

Since you say that most requests will be resolved by looking at the first dict, the quickest solution would be to do something like:

 try: item = d1[key] except KeyError: try: item = d2[key] except KeyError: ... 

However, this is certainly not the most convenient form of solutions, and I do not recommend using it. You can create a function:

 def get_from(item,dicts): for d in dicts: try: return d[item] except KeyError: pass else: raise KeyError("No item in dicts") 

which you would call the following:

 get_from(key,(d1,d2,d3)) 

(This is a simplified, slightly less clean version of the already very simple Chained map recipe suggested by @MartijnPieters in the comments on the original question - I would recommend using this over this code posted here. to demonstrate the concept in a more simplified way.)

Finally, perhaps a hybrid solution will work best in practice. The coefficient of the first try from the loop is a little ugly, but in most cases it avoids the loop overhead. Only if the first try raises a KeyError , do you introduce a loop type solution that I suggested above on the rest of the dicts. eg:.

 try: item = d1[key] except KeyError: item = get_from(key,(d2,d3)) 

again, just do it if you can reliably demonstrate (think timeit ) that it makes a noticeable difference


It is important to know that python try cheap, but except worth a decent amount of time. If your code succeeds, use try - except . If this is not warranted, it is often better to use try-except in any case, but in this case you should evaluate whether performance is really a problem, and only if you can demonstrate that it is a problem, you should resort to "search before jumping" .

Last remark. If the dictionaries are relatively static, it might be worth combining them into a 1 dict :

 d1.update(d2) d1.update(d3) 

Now you can just use d1 - it has all the information from d2 and d3 . (of course, the order of updates matters if dicts have the same keys, but have different meanings).

+1
source

try...except usually takes longer than using get , but that depends on a few things ...

Try using the timeit module to test performance in your specific situation as follows:

 def do_stuff(): blah timeit.timeit('testfunc()', 'from __main__ import do_stuff as testfunc') 
+1
source

You could also do

 sentinel = object() values = (d.get(key, sentinel) for d in (d1, d2, d3)) value = next(v for v in values if v is not sentinel) 

If none of the keys contains a key, this causes a StopIteration , not a KeyError .

0
source

Difference between conditional check

if 'key' in a_dict or similar, if a_dct.get('key') == None

and handle a KeyError when not 'key' in a_dict usually considered trivial and probably depends on the implementation of the python you are using.

The use of the conditional form is undoubtedly more pythons and, as a rule, is considered more expressive than eliminating the exception, which often leads to the creation of cleaner code. However, if your dictionary may contain arbitrary data, and you cannot know that the None value or some other magic value indicates that your key was not found, using the conditional form will require two searches, as you check the key first is in the dictionary and then retrieves the value. I.e:.

 if 'key': in a_dict: val = a_dcit['key'] 

Given the situation described, the code you provided is the slowest option possible, since key will be displayed in each of the dictionaries. A faster option is to guess the dictionary in which it will be, and sequentially search through other dictionaries:

 my_val = d1.get(key,None) if my_val == None: my_val = d2.get(key,None) if my_val == None: my_val = d3.get(key,None) if my_val == None: return False #handle not found in any case 

However, your specific use case sounds interesting and strange. Why are there several dictionaries with similar data? How are these dictionaries stored? If you already have a list or some other data structure in which these dictionaries are stored, it would be even more expressive to sort out the dictionaries.

 dict_list = [{},{},{}] #pretend you have three dicts in a list for d in dict_list: val = d.get('key',None) if val == None: break #val is now either None, or found. 
0
source

All Articles