To answer the initial problems associated with query execution (for searching in dict vs set ), it is somewhat surprising that finding a dict can be very fast (in Python 2.5.1 on my rather slow laptop), assuming, for example, that half of the search queries fails and half success. Here, as it turns out:
$ python -mtimeit -s'k=dict.fromkeys(range(99))' '5 in k and 112 in k' 1000000 loops, best of 3: 0.236 usec per loop $ python -mtimeit -s'k=set(range(99))' '5 in k and 112 in k' 1000000 loops, best of 3: 0.265 usec per loop
performing each check several times to make sure they are repeatable. Thus, if these 30 nanoseconds or less on a slow laptop are in an extremely important bottleneck, it might be worth going to the dict.fromkeys obscure solution rather than a simple, obvious, readable, and clearly correct set (unusual - almost always in Python is simple and a direct solution has performance advantages too).
Of course, you need to check with one native version of Python, the machine, the data and the ratio of successful tests with errors, and confirm with extremely accurate profiling that shaving 30 nanoseconds (or something else) this search will be important.
Fortunately, in the vast majority of cases, this will prove completely unnecessary ... but since programmers will be obsessed with pointless micro-optimizations, no matter how many times they are told about their irrelevance, the timeit module is right there in the standard library to do these basically pointless micro tests as light as pie anyway! -)
Alex martelli
source share