Preselect None List

Suppose you want to write a function that gives a list of objects, and you know in advance the length n such a list.

In python, a list supports indexed access in O (1), so it's nice to pre-select a list and access it using indexes instead of highlighting an empty list and using the append() method. This is because we avoid the burden of expanding the entire list if this is not enough.

If I use python, maybe performance doesn't matter anyway, but what is the best way to pre-select a list?

I know these possible candidates:

  • [None] * n β†’ selection of two lists
  • [None for x in range(n)] - or xrange in python2 β†’ creating another object

Is it significantly better than the other?

What if we are in the case of n = len(input) ? Since input already exists, will [None for x in input] have better wrt characteristics [None] * len(input) ?

+7
performance python list design-patterns
source share
3 answers

In the gap between these two parameters, the first one is clearly better, since the Python for loop is not involved.

 >>> %timeit [None] * 100 1000000 loops, best of 3: 469 ns per loop >>> %timeit [None for x in range(100)] 100000 loops, best of 3: 4.8 us per loop 

Update:

And list.append has O(1) complexity , this might be a better choice than a tentative list if you assign the list.append method list.append variable.

 >>> n = 10**3 >>> %%timeit lis = [None]*n for _ in range(n): lis[_] = _ ... 10000 loops, best of 3: 73.2 us per loop >>> %%timeit lis = [] for _ in range(n): lis.append(_) ... 10000 loops, best of 3: 92.2 us per loop >>> %%timeit lis = [];app = lis.append for _ in range(n): app(_) ... 10000 loops, best of 3: 59.4 us per loop >>> n = 10**6 >>> %%timeit lis = [None]*n for _ in range(n): lis[_] = _ ... 10 loops, best of 3: 106 ms per loop >>> %%timeit lis = [] for _ in range(n): lis.append(_) ... 10 loops, best of 3: 122 ms per loop >>> %%timeit lis = [];app = lis.append for _ in range(n): app(_) ... 10 loops, best of 3: 91.8 ms per loop 
+12
source share

When you add an item to the list, Python is 'over-allocates', see the source code of the list object. This means that, for example, when adding 1 element to a list of 8 elements, it actually frees up space for 8 new elements and uses only the first of them. The next 7 applications are then "free."

In many languages ​​(for example, Matlab) you are always told that you need to pre-select your vectors, since adding during the cycle is very expensive. In the worst case, adding one item to a list of length n may cost O(n) time, as you may need to create a larger list and copy all existing items. You need to do this at each iteration, so the total cost of adding n elements is O(n^2) , ouch. The Python pre-distribution scheme extends the costs of array expansion across many separate applications (see amortized costs ), effectively making the cost of one append O(1) and the total cost of adding n elements O(n) .

In Python, the overhead for the rest of your code is usually so high that the slight speedup that can be obtained by pre-allocating is negligible. Therefore, in most cases, just forget about pre-allocation, unless your profiler tells you that adding to the list is a bottleneck.

Other answers show some profiling of the preallocation list itself, but this is useless. The only thing that matters is the profiling of your complete code, with all your calculations inside your loop, with and without pre-assignment. If my prediction is correct, the difference is so small that the computation time you win is overshadowed by the time spent thinking, writing and maintaining additional lines to pre-distribute your list.

+13
source share

Obviously the first version. Let me explain why.

  • When you execute [None] * n , Python internally creates a list object of size n and copies the same object (here None ) (this is the reason you should use this method only when dealing with immutable objects) to all memory locations. Thus, memory allocation is performed only once. After that, you can list one iteration into the list in order to copy the object to all elements. list_repeat is a function that matches the type of list creation.

     # Creates the list of specified size np = (PyListObject *) PyList_New(size); .... ... items = np->ob_item; if (Py_SIZE(a) == 1) { elem = a->ob_item[0]; for (i = 0; i < n; i++) { items[i] = elem; // Copies the same item Py_INCREF(elem); } return (PyObject *) np; } 
  • When you use a list view to create a list, Python cannot know the actual size of the list being created, so it first allocates a chunk of memory and a new copy . the object is saved in the list. When the list exceeds the allocated length, it must again allocate memory and continue creating and saving a new object in the list.

+2
source share

All Articles