Installed against frozenset performance

I was looking for Python collection types set and frozenset .

At first, I suggested that frozenset provide better search performance than set , as it is immutable and, thus, can use the structure of stored elements.

However, this is not like the following experiment:

 import random import time import sys def main(n): numbers = [] for _ in xrange(n): numbers.append(random.randint(0, sys.maxint)) set_ = set(numbers) frozenset_ = frozenset(set_) start = time.time() for number in numbers: number in set_ set_duration = time.time() - start start = time.time() for number in numbers: number in frozenset_ frozenset_duration = time.time() - start print "set : %.3f" % set_duration print "frozenset: %.3f" % frozenset_duration if __name__ == "__main__": n = int(sys.argv[1]) main(n) 

I executed this code using both CPython and PyPy, which gave the following results:

 > pypy set.py 100000000 set : 6.156 frozenset: 6.166 > python set.py 100000000 set : 16.824 frozenset: 17.248 

It seems that frozenset is actually slower in terms of search performance, both in CPython and PyPy. Does anyone have an idea why this is so? I did not consider implementations.

+7
performance python set
source share
1 answer

The frozenset and set implementations frozenset mostly separate; a set is simply a frozenset with added mutation methods with the same hash table implementation. See Source Objects/setobject.c ; The top-level PyFrozenSet_Type definition uses the PySet_Type functions.

There is no optimization for frozenset, since there is no need to calculate hashes for elements in frozenset when you test membership. The element that you are using to test against the set still needs to calculate its hash to find the correct slot in the hashtable set so that you can run the equality test.

Thus, your synchronization results are probably disabled due to other processes running on your system; you measured the wall clock time and did not turn off the Python garbage collection, and you have repeatedly tested the same thing.

Try the test using the timeit module with one value from numbers and one of them is not installed:

 import random import sys import timeit numbers = [random.randrange(sys.maxsize) for _ in range(10000)] set_ = set(numbers) fset = frozenset(numbers) present = random.choice(numbers) notpresent = -1 test = 'present in s; notpresent in s' settime = timeit.timeit( test, 'from __main__ import set_ as s, present, notpresent') fsettime = timeit.timeit( test, 'from __main__ import fset as s, present, notpresent') print('set : {:.3f} seconds'.format(settime)) print('frozenset: {:.3f} seconds'.format(fsettime)) 

This repeats each test 1 million times and produces:

 set : 0.050 seconds frozenset: 0.050 seconds 
+27
source share

All Articles