Why is random list processing much faster than sorted list processing?

Question

Why is random list processing much faster than sorted list processing?

I tried to improve the performance of the func function, and I found that a simple change in how the aX list is created improves the performance quite a bit:

 import timeit import numpy as np def func(a, b): return [_ for _ in a if _ not in b] Na, Nb = 10000, 5000 b = list(np.random.randint(1000, size=Nb)) # Ordered list of Na integers a1 = [_ for _ in range(Na)] # Random list of Na integers a2 = list(np.random.randint(Na, size=Na)) # Ordered list of Na integers generated with numpy a3 = list(np.arange(Na)) start_time = timeit.default_timer() ab1 = func(a1, b) abt1 = timeit.default_timer() - start_time print("Time ab1", abt1) start_time = timeit.default_timer() ab2 = func(a2, b) abt2 = timeit.default_timer() - start_time print("Time ab2", abt2) start_time = timeit.default_timer() ab3 = func(a3, b) abt3 = timeit.default_timer() - start_time print("Time ab3", abt3) print("Ratio 1/2:", abt1 / abt2) print("Ratio 1/3:", abt1 / abt3)

In Python 2.7.13, this results in:

 ('Time ab1', 5.296088933944702) ('Time ab2', 1.5520200729370117) ('Time ab3', 1.5581469535827637) ('Ratio 1/2:', 3.412384302428827) ('Ratio 1/3:', 3.3989662667998095)

In Python 3.5.2, the difference is even greater:

 Time ab1 6.758207322000089 Time ab2 1.5693355060011527 Time ab3 1.5148192759988888 Ratio 1/2: 4.306413317073784 Ratio 1/3: 4.461395117608107

I need to handle ordered integers (i.e.: a1 or a3 ), so my question is:

Why is a random list processed much faster than a non- ordered list generated with numpy ?

+7

performance python

Gabriel Jun 01 '17 at 15:06

source share

2 answers

As discussed here , numpy arrays are much faster than python lists. This is why numpy arrays look faster since you are still using the numpy array when calling the list() function.

Using the numpy .tolist() function converts the NumPy array into regular Python objects to the end (as user .tolist() pointed out), and the performance differences disappear, see:

 import timeit import numpy as np def func(a, b): return [_ for _ in a if _ not in b] Na, Nb = 10000, 5000 b = list(np.random.randint(Na, size=Nb)) # len: 5000, max: 9999 # Ordered list of Na integers a1 = [_ for _ in range(Na)] # len: 10000, max: 9999 # Random list of Na integers a2 = np.random.randint(Na, size=Na).tolist() # len: 10000, max: 9999 # Ordered list of Na integers generated with numpy a3 = np.arange(Na).tolist() start_time = timeit.default_timer() ab1 = func(a1, b) abt1 = timeit.default_timer() - start_time print("Time ab1", abt1) start_time = timeit.default_timer() ab2 = func(a2, b) abt2 = timeit.default_timer() - start_time print("Time ab2", abt2) start_time = timeit.default_timer() ab3 = func(a3, b) abt3 = timeit.default_timer() - start_time print("Time ab3", abt3) print("Ratio 1/2:", abt1 / abt2) print("Ratio 1/3:", abt1 / abt3) #Time ab1 4.622085004015502 #Time ab2 4.598610720638726 #Time ab3 4.63976530848255 #Ratio 1/2: 1.005104646773301 #Ratio 1/3: 0.9961893968139456

Hope this answers your first question!

+1

Alessi 42 Jun 01 '17 at 16:11

source share

user2357112 · Accepted Answer · 2017-06-01T16:23:49+0000

Lists b , a2 and a3 are lists of NumPy scanners, and your list a1 is a list of regular Python ints. Comparing NumPy scalars with regular Python scalars requires a lot of extra type checking and enforcement, so the func(a1, b) test, which should compare NumPy scalars with regular Python scalars, is slower.

If you make b list of Python ints (by calling the tolist method instead of the list function), the time difference changes.

You might consider using Python set or NumPy set-like operations to accomplish your task.

Why is random list processing much faster than sorted list processing?

More articles: