Python is a key in dict that is different / faster than a key in dict.keys ()

Intuitively, I think that key in dict faster than key in dict.keys() , since .keys() creates a list of keys. This question should confirm whether this is true.

Just wondering if key in dict internally creates / uses a list to find out if a key is present?

Also, is one method faster than another?

+7
performance python dictionary
source share
1 answer

Short answer:

  • In python 2: your guess is true: dict.keys() slows down.
  • In python 3: your guess is wrong: in dict.keys() does, like in dict

Details for py2 and py3 are given below.

Answer in Python 2.7:

Your assumption is true for the following reasons:

  • dict.keys() enables an additional function call (stack overhead).
  • dict.keys() returns a list containing all the keys in memory (as opposed to a lazy generator object). Therefore, it must allocate memory.
  • key in dict can use an inline object, which is an indexed search. key in dict.keys() is a linear search in a list

I created a small script icon to show my point:

 #! /usr/bin/python2.7 import datetime as dt import random dict_size = 1000000 num_iterations = 100 d = {i: i for i in xrange(dict_size)} def f(): k = random.randint(0,dict_size-1) return (k in d) def g(): k = random.randint(0,dict_size-1) return (k in d.keys()) def test(func): t = dt.datetime.utcnow() for i in xrange(num_iterations): func() print "%s --> %1.6fs" % (func, (dt.datetime.utcnow()-t).total_seconds()) test(f) test(g) 

Exit (python 2.7.6 Ubuntu 14.04):

 <function f at 0x7ff2e0126d70> --> 0.000598 s <function g at 0x7ff2e0126de8> --> 5.191553 s 

I also tested with nr iterations, and the dict size changed (dict only 100 elements, 1M iterations).

 <function f at 0x7f94cb5e6d70> --> 3.614162 s <function g at 0x7f94cb5e6de8> --> 7.007922 s 

Here the results are much closer to each other.

So the performance difference really grows with the dict size.

Python 3 answer:

I adapted the script for python 3:

  • xrange gone, use range . (not in the performance-critical internal circuit of the test function, therefore, a limited performance impact).
  • use curly braces with print
  • change the shabang line to #!/usr/bin/python3

And tested with python 3.4.3 on the same computer.

  • dict_size = 1000000; num_iterations = 100

    f → 0.000590 s g → 0.000565 s

  • dict_size = 100; num_iterations = 1000000

    f → 4.525487 s g → 4.837232 s

So in python 3 the performance difference is gone.

+9
source share

All Articles