What is the most efficient way to remove a group of indexes from a list of numbers in Python 2.7?

Therefore, I was wondering how I can, using Python 2.7, use the list of values ​​used to represent such indices most effectively: (but with a length of up to 250,000 +)

indices = [2, 4, 5] 

and remove this list of indexes from a larger list, for example: (3,000,000+ items)

 numbers = [2, 6, 12, 20, 24, 40, 42, 51] 

to get this result:

 [2, 6, 20, 42, 51] 

I am looking for an effective solution more than anything else. I know that there are many ways to do this, but that is not my problem. Efficiency is. In addition, this operation must be performed many times, and the lists will be exponentially smaller. I have no equation to imagine how much less they will be over time.

edit:

Numbers must remain sorted in the list all the time or return to sorting after index removal. A list called an index may or may not be sorted. It doesn't even have to be on the list.

+6
source share
6 answers

I suspect that whole fragments between indexes may be faster than list comprehension

 def remove_indices(numbers, indices): result = [] i=0 for j in sorted(indices): result += numbers[i:j] i = j+1 result += numbers[i:] return result 
+5
source

You might consider using the numpy library for efficiency (which, if you are dealing with lists of integers, might not be a bad idea):

 >>> import numpy as np >>> a = np.array([2, 6, 12, 20, 24, 40, 42, 51]) >>> np.delete(a, [2,4,5]) array([ 2, 6, 20, 42, 51]) 

Notes on np.delete : http://docs.scipy.org/doc/numpy/reference/generated/numpy.delete.html

It may also be worth looking at preserving the main array as it is, but preserving the masked array (do not do any speed tests on this, though ...)

+6
source

Here is my first approach.

 def remove_indices(numbers, indices): indices = set(indices) return [x for i, x in enumerate(numbers) if i not in indices] 

Here is a test module for checking it under the conditions you specified. (3 million items with a removal of 250 thousand)

 import random def create_test_set(): numbers = range(3000000) indices = random.sample(range(3000000), 250000) return numbers, indices def remove_indices(numbers, indices): indices = set(indices) return [x for i, x in enumerate(numbers) if i not in indices] if __name__ == '__main__': import time numbers, indices = create_test_set() a = time.time() numbers = remove_indices(numbers, indices) b = time.time() print b - a, len(numbers) 

It takes about 0.6 seconds on my laptop. You might consider setting indexes in advance if you will use it several times.

(The FWIW bradley.ayers solution took longer than I was prepared to wait.)

Edit : this is a little faster: (0.55 seconds)

 def remove_indices(numbers, indices): return [numbers[i] for i in xrange(len(numbers)) if i not in indices] 
+3
source

Another option:

 >>> numbers = [2, 6, 12, 20, 24, 40, 42, 51] >>> indicies = [2, 4, 5] >>> offset = 0 >>> for i in indicies: ... del numbers[i - offset] ... offset += 1 ... >>> numbers [2, 6, 20, 42, 51] 

Edit:

Therefore, being hopelessly mistaken in this answer, I compared each of the different approaches:

enter image description here

The horizontal axis is the number of elements, the vertical axis is time in seconds.

The quickest option is to use slicing to create a new list (from @gnibbler):

 def using_slices(numbers, indices): result = [] i = 0 for j in indices: result += numbers[i:j] i = j + 1 result += numbers[i:] 

It's amazing that he β€œinstalls” (@ Eric) to beat numpy.delete (@Jon Clements)

Here I used the script , maybe I missed something.

+2
source

Not that effective, but a different approach

 indices = set([2, 4, 5]) result = [x for i,x in enumerate(numbers) if i not in indices] 
+1
source

Another approach to achieve this:

 >>> numbers = [2, 6, 12, 20, 24, 40, 42, 51] >>> indices = [2, 4, 5] >>> [item for item in numbers if numbers.index(item) not in indices] [2, 6, 20, 42, 51] 
0
source

All Articles