Efficient iteration over a slice in Python

How effective are iterations on shear operations in Python? And if a copy is inevitable with fragments, is there an alternative?

I know that the slice operation on a list is O (k), where k is the slice size.

x[5 : 5+k] # O(k) copy operation 

However, when repeating part of the list, I believe that the cleanest (and most Pythonic?) Way to do this (without having to resort to indexes) is to do this:

 for elem in x[5 : 5+k]: print elem 

However, my intuition is that this still leads to an expensive copy of the subscriptions, and not just to repetition on the existing list.

+7
performance python iteration slice
source share
5 answers

You can use itertools.islice to get a rugged iterator from a list:

Example:

 >>> from itertools import islice >>> lis = range(20) >>> for x in islice(lis, 10, None, 1): ... print x ... 10 11 12 13 14 15 16 17 18 19 

Update:

As noted by @ user2357112, the performance of islice depends on the starting point of the slice and the size of the iterable, a normal slice will be fast in almost all cases and should be preferred. Here are some more time comparisons:

For islice lists, islice slightly faster or equal to the normal slice, when the starting point of the slice is less than half the size of the list; for large indexes, the normal slice is a clear winner.

 >>> def func(lis, n): it = iter(lis) for x in islice(it, n, None, 1):pass ... >>> def func1(lis, n): #it = iter(lis) for x in islice(lis, n, None, 1):pass ... >>> def func2(lis, n): for x in lis[n:]:pass ... >>> lis = range(10**6) >>> n = 100 >>> %timeit func(lis, n) 10 loops, best of 3: 62.1 ms per loop >>> %timeit func1(lis, n) 1 loops, best of 3: 60.8 ms per loop >>> %timeit func2(lis, n) 1 loops, best of 3: 82.8 ms per loop >>> n = 1000 >>> %timeit func(lis, n) 10 loops, best of 3: 64.4 ms per loop >>> %timeit func1(lis, n) 1 loops, best of 3: 60.3 ms per loop >>> %timeit func2(lis, n) 1 loops, best of 3: 85.8 ms per loop >>> n = 10**4 >>> %timeit func(lis, n) 10 loops, best of 3: 61.4 ms per loop >>> %timeit func1(lis, n) 10 loops, best of 3: 61 ms per loop >>> %timeit func2(lis, n) 1 loops, best of 3: 80.8 ms per loop >>> n = (10**6)/2 >>> %timeit func(lis, n) 10 loops, best of 3: 39.2 ms per loop >>> %timeit func1(lis, n) 10 loops, best of 3: 39.6 ms per loop >>> %timeit func2(lis, n) 10 loops, best of 3: 41.5 ms per loop >>> n = (10**6)-1000 >>> %timeit func(lis, n) 100 loops, best of 3: 18.9 ms per loop >>> %timeit func1(lis, n) 100 loops, best of 3: 18.8 ms per loop >>> %timeit func2(lis, n) 10000 loops, best of 3: 50.9 us per loop #clear winner for large index >>> %timeit func1(lis, n) 

For Small Lists, a normal slice is faster than islice for almost all cases.

 >>> lis = range(1000) >>> n = 100 >>> %timeit func(lis, n) 10000 loops, best of 3: 60.7 us per loop >>> %timeit func1(lis, n) 10000 loops, best of 3: 59.6 us per loop >>> %timeit func2(lis, n) 10000 loops, best of 3: 59.9 us per loop >>> n = 500 >>> %timeit func(lis, n) 10000 loops, best of 3: 38.4 us per loop >>> %timeit func1(lis, n) 10000 loops, best of 3: 33.9 us per loop >>> %timeit func2(lis, n) 10000 loops, best of 3: 26.6 us per loop >>> n = 900 >>> %timeit func(lis, n) 10000 loops, best of 3: 20.1 us per loop >>> %timeit func1(lis, n) 10000 loops, best of 3: 17.2 us per loop >>> %timeit func2(lis, n) 10000 loops, best of 3: 11.3 us per loop 

Output:

Go for normal slices.

+6
source share

Using:

 for elem in x[5 : 5+k]: 

This is Pythonic! Do not change this until you have profiled your code and determined that this is a bottleneck, although I doubt that you will ever find it to be the main source of the bottleneck.


In terms of speed, this is likely to be your best bet:

 In [30]: x = range(100) In [31]: k = 90 In [32]: %timeit x[5:5+k] 1000000 loops, best of 3: 357 ns per loop In [35]: %timeit list(IT.islice(x, 5, 5+k)) 100000 loops, best of 3: 2.42 us per loop In [36]: %timeit [x[i] for i in xrange(5, 5+k)] 100000 loops, best of 3: 5.71 us per loop 

In terms of memory, this is not as bad as you think. x[5: 5+k] is a shallow copy of part x . Therefore, even if the objects in x are large, x[5: 5+k] creates a new list with elements k that refer to the same objects as x . Therefore, you need additional memory to create a list with k links to pre-existing objects. This will probably not be a source of memory problems.

+6
source share

Just go over the necessary indexes, there is no need to create a new fragment for this:

 for i in xrange(5, 5+k): print x[i] 

Provided: it looks non-stringent, but it is more efficient than creating a new fragment in the sense that extra memory is not wasted. An alternative would be to use an iterator, as shown in @AshwiniChaudhary's answer.

+4
source share

You are already iterating O (n) over the slice. In most cases, this will be much more worrying than the actual creation of the slice, which happens completely in optimized C. Looping over the slice after you have done this takes twice as much time as creating the slice, even if you do not do anything with it:

 >>> timeit.timeit('l[50:100]', 'import collections; l=range(150)') 0.46978958638010226 >>> timeit.timeit('for x in slice: pass', 'import collections; l=range(150); slice=l[50:100]') 1.2332711270150867 

You can try iterating over the indices using xrange , but given the time it takes to retrieve a list item, it is slower than slicing. Even if you skip this part, it still doesn't cut the slice:

 >>> timeit.timeit('for i in xrange(50, 100): x = l[i]', 'l = range(150)') 4.3081963062022055 >>> timeit.timeit('for i in xrange(50, 100): pass', 'l = range(150)') 1.675838213385532 

Do not use itertools.islice for this! It will run your list from the very beginning, rather than skipping the values ​​you want with __getitem__ . Here are some temporary data showing how its performance depends on where the slice starts:

 >>> timeit.timeit('next(itertools.islice(l, 9, None))', 'import itertools; l = r ange(1000000)') 0.5628290558478852 >>> timeit.timeit('next(itertools.islice(l, 999, None))', 'import itertools; l = range(1000000)') 6.885294697594759 

Here islice plays regular slicing:

 >>> timeit.timeit('for i in itertools.islice(l, 900, None): pass', 'import itert ools; l = range(1000)') 8.979957560911316 >>> timeit.timeit('for i in l[900:]: pass', 'import itertools; l = range(1000)') 3.0318417204211983 

This is on Python 2.7.5 if any later versions add list-based optimization.

+2
source share

I think the best way is to use the c-like iteration if the “k” is so big (like the big “k” -like 1,000,000,000,000- it can even make you wait about 10 hours to get an answer in a python loop)

this is what i am trying to tell you:

 i = 5 ## which is the initial value f = 5 + k ## which will be the final index while i < f: print(x[i]) i += 1 

I assume that this can be done in just 5 hours (since the pythonic equivalent for the loop does this for about 10 hours) for this big k, which I mentioned, because it needs to go from 5 to 1,000,000,000,000,000 just once! every use of "range ()" from "xrange ()" or even the cut itself (as mentioned above) makes the program just do 2,000,000,000,000 iterations, which can lead to longer execution time, I think. (as I learn, using the generator method, it will return an iterable object that must first run the generator in order for it to run, and it takes two times to complete the task: one for the generator itself and the other for the "for" loop)

Edited by:

In python 3, the generator method / object should not be started first to make an iterable object for the for loop

0
source share

All Articles