Performance comparison: insert vs build Python set operations

In python, faster a) Create a set from a list of n elements b) Insert n elements into a set?

I found this page (http://wiki.python.org/moin/TimeComplexity), but it did not have enough information to conclude that it was faster.

It seems, inserting the elements one at a time, in the worst case, you can take O (n * n) time (given that it uses dicts) and O (n * 1) in the middle case. Does list initialization offer any performance boost?

+8
python set time-complexity
source share
4 answers

From the point of view of complexity, O() is definitely the same, because both approaches do the same - insert elements n into the set.

The difference comes from the implementation: one obvious advantage of initializing from iterable is that you save a lot of function calls at the Python level - initialization from iteration is done entirely at C (**) level.

Indeed, some tests on a list of 5,000,000 random numbers show that adding one at a time is slower:

 lst = [random.random() for i in xrange(5000000)] set1 = set(lst) # takes 2.4 seconds set2 = set() # takes 3.37 seconds for item in lst: set2.add(item) 

(**) Objects/setobject.c inside the sets code ( Objects/setobject.c ), in the end, inserting an element is reduced to calling set_add_key . When initializing with iteration, this function is called in a narrow C loop:

 while ((key = PyIter_Next(it)) != NULL) { if (set_add_key(so, key) == -1) { Py_DECREF(it); Py_DECREF(key); return -1; } Py_DECREF(key); } 

On the other hand, each call to set.add calls for an attribute that resolves the C function to set_add , which in turn calls set_add_key . Since adding the element itself is relatively fast (the Python hash table implementation is very efficient), these additional calls all accumulate.

+16
source share
 $ python -V Python 2.5.2 $ python -m timeit -s "l = range(1000)" "set(l)" 10000 loops, best of 3: 64.6 usec per loop $ python -m timeit -s "l = range(1000)" "s = set()" "for i in l:s.add(i)" 1000 loops, best of 3: 307 usec per loop 
+2
source share

The following are the results of a comparison using timeit . It seems that initializing a set using a list will be faster, curious to know why this is so:

 from timeit import timeit timeit("set(a)","a=range(10)") # 0.9944498532640864 timeit("for i in a:x.add(i)","a=range(10);x=set()") # 1.6878826778265648 

Python Version: 2.7

0
source share

On my Thinkpad:

 In [37]: timeit.timeit('for a in x: y.add(a)', 'y=set(); x=range(10000)', number=10000) Out[37]: 12.18006706237793 In [38]: timeit.timeit('y=set(x)', 'y=set(); x=range(10000)', number=10000) Out[38]: 3.8137960433959961 
0
source share

All Articles