Best way to modify a list while iterating over it

I have multiple instances in a python script (v2.6) where I need to change the list in place. I need to output the values ​​from the list in response to interactive input from the user and would like to know the cleanest way to do this. Currently, I have very dirty solutions: a) setting the elements in the list that I want to delete for False and deleting them with a filter or list, or b) creating a completely new list when going through the loop, which seems unnecessary when adding variables to the space names and taking up memory.

An example of this problem:

for i, folder in enumerate(to_run_folders): if get_size(folder) < byte_threshold: ans = raw_input(('The folder {0}/ is less than {1}MB.' + \ ' Would you like to exclude it from' + \ ' compression? ').format(folder, megabyte_threshold)) if 'y' in ans.strip().lower(): to_run_folders.pop(i) 

I would like to see each folder in the list. If the current folder is smaller than a certain size, I want to ask the user if they want to exclude it. If they do, pull the folder out of the list.

The problem with this routine is that if I iterate over the list, I get unexpected behavior and early termination. If I iterate over a copy by slicing, the pop does not pull out the correct value, because the indexes are shifted and the problem occurs when more objects appear. I need to dynamically configure this kind of list in other areas of my script. Is there any clean method for this kind of function?

+8
python loops dynamic
source share
5 answers

You can move backward through the list or use a view object.

See stack overflow list back overflow Basically use reversed(yourList) (what happens creates a view object that visits the back side).

If you need indexing, you can do reversed(enumerate(yourList)) , but that would effectively create a temporary list in memory, because enumerate had to work before reversed could hit. You will need to either manipulate the indexes, or for this:

 for i in xrange(len(yourList)-1, -1, -1): item = yourList[i] ... 

Even cleaner: reversed knows range , so you can do it in python3 or in python2 if you use xrange instead:

 for i in reversed(range(len(yourList))): item = yourList[i] ... 

(proof: you can do next(reversed(range(10**10))) , but this will crash your computer when using python2)

+8
source share

You can flip it back

Backwards:

 x = range(10) l = len(x)-1 # max index for i, v in enumerate(reversed(x)): if v % 2: x.pop(li) # l-1 is the forward index 
+4
source share

I currently have very dirty solutions: a) setting the items in the list that I want to remove to False and deleting them with a filter or list, or b) creating a completely new list, going through a loop that seems like no need to add variables into the namespace and take up memory.

Actually, this is not a dirty decision. How long is this list usually? Even when creating a new list, there should not be that much memory because the list contains only links.

You can also execute a while and list for yourself by doing del lst[n] if the user decides (perhaps by separately counting the positions in the original).

+1
source share

Ok, I measured the solution. The inverse solutions are about the same. The straight while about 4 times slower. BUT! Patrik's dirty solution is about 80 times faster for a list of 100,000 random integers [Patrik2 bug fixed] :

 import timeit import random def solution_ninjagecko1(lst): for i in xrange(len(lst)-1, -1, -1): if lst[i] % 2 != 0: # simulation of the choice del lst[i] return lst def solution_jdi(lst): L = len(lst) - 1 for i, v in enumerate(reversed(lst)): if v % 2 != 0: lst.pop(Li) # L-1 is the forward index return lst def solution_Patrik(lst): for i, v in enumerate(lst): if v % 2 != 0: # simulation of the choice lst[i] = None return [v for v in lst if v is not None] def solution_Patrik2(lst): ##buggy lst = [v for v in lst if v % 2 != 0] ##buggy return [v for v in lst if v is not None] # ... corrected to return [v for v in lst if v % 2 != 0] def solution_pepr(lst): i = 0 # indexing the processed item n = 0 # enumerating the original position while i < len(lst): if lst[i] % 2 != 0: # simulation of the choice del lst[i] # i unchanged if item deleted else: i += 1 # i moved to the next n += 1 return lst def solution_pepr_reversed(lst): i = len(lst) - 1 # indexing the processed item backwards while i > 0: if lst[i] % 2 != 0: # simulation of the choice del lst[i] # i unchanged if item deleted i -= 1 # i moved to the previous return lst def solution_steveha(lst): def should_keep(x): return x % 2 == 0 return filter(should_keep, lst) orig_lst = range(30) print 'range() generated list of the length', len(orig_lst) print orig_lst[:20] + ['...'] # to have some fun :) lst = orig_lst[:] # copy of the list print solution_ninjagecko1(lst) lst = orig_lst[:] # copy of the list print solution_jdi(lst) lst = orig_lst[:] # copy of the list print solution_Patrik(lst) lst = orig_lst[:] # copy of the list print solution_pepr(lst) orig_lst = [random.randint(1, 1000000) for n in xrange(100000)] print '\nrandom list of the length', len(orig_lst) print orig_lst[:20] + ['...'] # to have some fun :) lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_ninjagecko1(lst)', 'from __main__ import solution_ninjagecko1, lst', number=1) print 'solution_ninjagecko1: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_jdi(lst)', 'from __main__ import solution_jdi, lst', number=1) print 'solution_jdi: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_Patrik(lst)', 'from __main__ import solution_Patrik, lst', number=1) print 'solution_Patrik: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_Patrik2(lst)', 'from __main__ import solution_Patrik2, lst', number=1) print 'solution_Patrik2: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_pepr_reversed(lst)', 'from __main__ import solution_pepr_reversed, lst', number=1) print 'solution_pepr_reversed: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_pepr(lst)', 'from __main__ import solution_pepr, lst', number=1) print 'solution_pepr: ', t lst = orig_lst[:] # copy of the list t = timeit.timeit('solution_steveha(lst)', 'from __main__ import solution_steveha, lst', number=1) print 'solution_steveha: ', t 

It prints on my console:

 c:\tmp\_Python\Patrick\so10305762>python a.py range() generated list of the length 30 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, '...'] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28] random list of the length 100000 [915411, 954538, 794388, 847204, 846603, 454132, 866165, 640004, 930488, 609138, 333405, 986073, 318301, 728151, 996047, 117633, 455353, 581737, 55350, 485030, '...'] solution_ninjagecko1: 2.41921752625 solution_jdi: 2.45477176569 solution_Patrik: 0.0468565138865 solution_Patrik2: 0.024270403082 solution_pepr_reversed: 2.43338888043 solution_pepr: 9.11879694207 

So, I tried with a longer list. Using only twice as much is of great importance (on my old computer). Patrik dirty decision behaves very beautifully. This is about 200 times faster than the inverse solutions:

 random list of the length 200000 [384592, 170167, 598270, 832363, 123557, 81804, 319315, 445945, 178732, 726600, 516835, 392267, 552608, 40807, 349215, 208111, 880032, 520614, 384119, 350090, '...'] solution_ninjagecko1: 17.362140719 solution_jdi: 17.86837545 solution_Patrik: 0.0957998851809 solution_Patrik2: 0.0500024444448 solution_pepr_reversed: 17.5078452708 solution_pepr: 52.175648581 

[Posted after comments by ninjagecko]

The patrik2 fixed solution is about twice as fast as patrick's 2-stage solution.

To simulate the not so frequent removal of elements, a test similar to if v % 2 != 0: was changed to if v % 100 == 0: Then you need to remove about 1% of the items. Obviously, this takes less time. For 500,000 random integers in the list, the results are as follows:

 random list of the length 500000 [403512, 138135, 552313, 427971, 42358, 500926, 686944, 304889, 916659, 112636, 791585, 461948, 82622, 522768, 485408, 774048, 447505, 830220, 791421, 580706, '...'] solution_ninjagecko1: 6.79284210703 solution_jdi: 6.84066913532 solution_Patrik: 0.241242951269 solution_Patrik2: 0.162481823807 solution_pepr_reversed: 6.92106007886 solution_pepr: 7.12900522273 

Patrick's solution is about 30 times faster.

[Posted on 2012/04/25]

Another solution that works on the spot, which moves forward, is as fast as Patrick’s solution. It does not move the entire tail when an item is removed. Instead, it moves the desired items to their final position, and then cuts the unused tail of the list.

 def solution_pepr2(lst): i = 0 for v in lst: lst[i] = v # moving the element (sometimes unneccessary) if v % 100 != 0: # simulation of the choice i += 1 # here will be the next one lst[i:] = [] # cutting the tail of the length of the skipped return lst # The following one only adds the enumerate to simulate the situation when # it is needed -- ie slightly slower but with the same complexity. def solution_pepr2enum(lst): i = 0 for n, v in enumerate(lst): lst[i] = v # moving the element (sometimes unneccessary) if v % 100 != 0: # simulation of the choice i += 1 # here will be the next one lst[i:] = [] # cutting the tail of the length of the skipped return lst 

Compared to the solutions above for v % 100 != 0 :

 random list of the length 500000 [533094, 600755, 58260, 295962, 347612, 851487, 523927, 665648, 537403, 238660, 781030, 940052, 878919, 565870, 717745, 408465, 410781, 560173, 51010, 730322, '...'] solution_ninjagecko1: 1.38956896051 solution_jdi: 1.42314502685 solution_Patrik: 0.135545530079 solution_Patrik2: 0.0926935780151 solution_pepr_reversed: 1.43573239178 solution_steveha: 0.122824246805 solution_pepr2: 0.0938177241656 solution_pepr2enum: 0.11096263294 
+1
source share

The best way to handle this, the most “Putin's” way is to actually iterate over your list and create a new list containing only the folders you need. Here's how I do it:

 def want_folder(fname): if get_size(folder) >= byte_threshold: return True ans = raw_input(('The folder {0}/ is less than {1}MB.' + \ ' Would you like to exclude it from' + \ ' compression? ').format(folder, megabyte_threshold)) return 'y' not in ans.strip().lower() to_run_folders = [fname for fname in to_run_folders if want_folder(fname)] 

If your list is really large, then perhaps you need to worry about the performance of this solution and use dirty tricks. But if your list is so big, maybe it's kind of crazy for a person to answer the yes / no question about all the files that may appear.

Is performance a real issue or just some sort of thrilling concern? Because I am sure that the above code is fast enough for practical use, and it is easier to understand and change than complex code.

EDIT: @jdi suggested using itertools.ifilter() or filter() in the comments

I tested, and this should be faster than I showed above:

 to_run_folders = filter(want_folder, to_run_folders) 

I just copied the @pepr comparison code and tested the solution with filter() as shown here. This was the second fastest result, and only Patrik2 was faster. Patrik2 was twice as fast, but again, any dataset small enough to have a person who answered yes / no questions was probably small enough that factor two didn't matter much.

EDIT: Just for fun, I went ahead and wrote a version that is a pure list comprehension. It has one expression to evaluate, no Python function call.

 to_run_folders = [fname for fname in to_run_folders if get_size(fname) >= mb_threshold or 'y' not in raw_input(('The folder {0}/ is less than {1}MB.' + ' Would you like to exclude it from compression? ' ).format(fname, mb_threshold)).strip().lower() ] 

Ugh! I prefer to do the function.

0
source share

All Articles