From the point of view of complexity, O() is definitely the same, because both approaches do the same - insert elements n into the set.
The difference comes from the implementation: one obvious advantage of initializing from iterable is that you save a lot of function calls at the Python level - initialization from iteration is done entirely at C (**) level.
Indeed, some tests on a list of 5,000,000 random numbers show that adding one at a time is slower:
lst = [random.random() for i in xrange(5000000)] set1 = set(lst) # takes 2.4 seconds set2 = set() # takes 3.37 seconds for item in lst: set2.add(item)
(**) Objects/setobject.c inside the sets code ( Objects/setobject.c ), in the end, inserting an element is reduced to calling set_add_key . When initializing with iteration, this function is called in a narrow C loop:
while ((key = PyIter_Next(it)) != NULL) { if (set_add_key(so, key) == -1) { Py_DECREF(it); Py_DECREF(key); return -1; } Py_DECREF(key); }
On the other hand, each call to set.add calls for an attribute that resolves the C function to set_add , which in turn calls set_add_key . Since adding the element itself is relatively fast (the Python hash table implementation is very efficient), these additional calls all accumulate.
Eli bendersky
source share