Speeding up nested loops with element gain

I am working on large code, and I am in need of speeding it up. I created the MWE shown below:

 import numpy as np import time def random_data(N): # Generate some random data. return np.random.uniform(0., 10., N).tolist() # Lists that contain all the data. list1 = [random_data(10) for _ in range(1000)] list2 = [random_data(1000), random_data(1000)] # Start taking the time. tik = time.time() list4 = [] # Loop through all elements in list1. for elem in list1: list3 = [] # Loop through elements in list2. for elem2 in zip(*list2): A = np.exp(-0.5*((elem[0]-elem2[0])/elem[3])**2) B = np.exp(-0.5*((elem[1]-elem2[1])/elem[3])**2) list3.append(A*B) # Sum elements in list3 and append result to list4. sum_list3 = sum(list3) if sum(list3)>0. else 1e-06 list4.append(sum_list3) # Print the elapsed time. print time.time()-tik 

The strange format of list1 and list2 is related to how this block of code gets them.

The obvious part, in which most of the time is spent, is to recursively compute the terms A and B

Is there a way to speed up this block of code without parallelizing it (I tried this before and it gave me a lot of problems )? I am open to using any numpy , scipy package, etc.


Add

This is the result of abarnert optimizations, and Jaime is advised to do only one exponentiation. The optimized function in my system is on average ~ 60 times.

 import numpy as np import timeit def random_data(N): return np.random.uniform(0., 10., N).tolist() # Lists that contain all the data. list1 = [random_data(10) for _ in range(1000)] list2 = [random_data(1000), random_data(1000)] array1 = np.array(list1) array2 = np.array(zip(*list2)) # Old non-optimezed function. def func1(): list4 = [] # Process all elements in list1. for elem in list1: # Process all elements in list2. list3 = [] for elem2 in zip(*list2): A = np.exp(-0.5*((elem[0]-elem2[0])/elem[3])**2) B = np.exp(-0.5*((elem[1]-elem2[1])/elem[3])**2) list3.append(A*B) # Sum elements in list3 and append result to list4. sum_list3 = sum(list3) if sum(list3)>0. else 1e-06 list4.append(sum_list3) # New optimized function. def func2(): list4 = [] # Process all elements in list1. for elem in array1: # Broadcast over elements in array2. A = -0.5*((elem[0]-array2[:,0])/elem[3])**2 B = -0.5*((elem[1]-array2[:,1])/elem[3])**2 array3 = np.exp(A+B) # Sum elements in array3 and append result to list4. sum_list3 = max(array3.sum(), 1e-10) list4.append(sum_list3) # Get time for both functions. func1_time = timeit.timeit(func1, number=10) func2_time = timeit.timeit(func2, number=10) # Print hom many times faster func2 is versus func1. print func1_time/func2_time 
+6
source share
1 answer

You want to gradually transform this from the use of lists and loops to the use of arrays and broadcasting, first capturing the simplest and / or most important parts in time until they become fast enough.

The first step is not to repeat this zip(*list2) again and again (especially if it's Python 2.x). While we are doing this, we can save it in an array and do the same with list1 - you can still list1 over them at the moment. So:

 array1 = np.array(list1) array2 = np.array(zip(*list2)) # … for elem in array1: # … for elem2 in array2: 

This will not speed up work on my machine, it will take from 14.1 seconds to 12.9, but it gives us the opportunity to get started.

You should also remove the double calculation sum(list3) :

 sum_list3 = sum(list3) sum_list3 = sum_list3 if sum_list3>0. else 1e-06 

Meanwhile, it’s a little strange that you want value <= 0 to go to 1e-6 , but leave 0 < value < 1e-6 alone. Is this really intentional? If not, you can fix this and simplify the code at the same time by simply doing this:

 sum_list3 = max(array3.sum(), 1e-06) 

Now broadcast mailings A and B :

 # Broadcast over elements in list2. A = np.exp(-0.5*((elem[0]-array2[:,0])/elem[3])**2) B = np.exp(-0.5*((elem[1]-array2[:, 1])/elem[3])**2) array3 = A*B # Sum elements in list3 and append result to list4. sum_list3 = max(array3.sum(), 1e-06) list4.append(sum_list3) 

And that forces us from 12.9 seconds to 0.12. You can take it one step further by also passing along array1 and replacing list4 with a pre-allocated array, etc., but this is probably fast enough.

+7
source

All Articles