A fairly general operation is to filter one list based on another list . People quickly find that it is:
[x for x in list_1 if x in list_2]
slow for large inputs is O (n * m). Ugh. How do we speed it up? Use set to search for O (1) filters:
s = set(list_2) [x for x in list_1 if x in s]
This gives good overall O (n) behavior. I often see that even veteran coders get into The Trap ™:
[x for x in list_1 if x in set(list_2)]
Ack! This is O (n * m) again, since python creates set(list_2) every time, not just once.
I thought this was the end of the story - python cannot optimize it to only create set once. Just know about the trap. Gotta live with that. Hm.
#python 3.3.2+ list_2 = list(range(20)) #small for demonstration purposes s = set(list_2) list_1 = list(range(100000)) def f(): return [x for x in list_1 if x in s] def g(): return [x for x in list_1 if x in set(list_2)] def h(): return [x for x in list_1 if x in {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19}] %timeit f() 100 loops, best of 3: 7.31 ms per loop %timeit g() 10 loops, best of 3: 77.4 ms per loop %timeit h() 100 loops, best of 3: 6.66 ms per loop
Yes, python (3.3) can optimize a given literal. It is even faster than f() in this case, apparently because it replaces a LOAD_GLOBAL with LOAD_FAST .
#python 2.7.5+ %timeit h() 10 loops, best of 3: 72.5 ms per loop
Python 2 in particular does not do this optimization. I tried again to examine what python3 does, but unfortunately dis.dis cannot examine the insides of understanding expressions. Basically, everything interesting turns into MAKE_FUNCTION .
So now I'm wondering - why does python 3.x optimize the given literal only for assembly once, but not set(list_2) ?