Exclude for-loop in Python and Numpy constructs

Question

Exclude for-loop in Python and Numpy constructs

I am looking for a method in the Python and / or Numpy vector to exclude the use of the for loop for the following:

for i in list_range_values: v[list_list_values[i]] += list_comp_values[i]

Where:

list_range_values is a list of Python integer values, e.g. [1, 3, 5] taken from the range (0, R-1, 1)
list_comp_values is a list of numeric values in Python, for example. [0.7, 9.8, 1.2, 5, 10, 11.7, 6, 0.2] such that len (list_comp_values) = R
v is a zero vector of length V such that R can be <, =,> than V
list_list_values - a list of Python lists (each list contains a different number of integer values, for example [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45], [4, 7, 8 ], [0, 1], [21, 31, 41], [9, 11, 22, 33, 44], [17, 19]]) taken from the range (0, V-1, 1) and with len (list_list_values) = R

Eg.

 for i in list_range_values (= [1, 3, 5]): i=1: v[[5, 7, 11, 25, 99]] += list_comp_values[1] (= 9.8) i=3: v[[4, 7, 8]] += list_comp_values[3] (= 5) i=5: v[[21, 31, 41]] += list_comp_values[5] (= 11.7)

Is there a method available to exclude a for loop?

Cython, Scipy / Weave / Blitz and the C-module are alternative solutions, but you want to make sure that there is an answer to the Numpy vector.

+7

python numpy for-loop

Henry Thornton Jan 2 '12 at 13:03

source share

3 answers

Using your input example:

 >>> list_list_values = [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45], [4, 7, 8], [0, 1], [21, 31, 41], [9, 11, 22, 33, 44], [17, 19]] >>> list_comp_values = [0.7, 9.8, 1.2, 5, 10, 11.7, 6, 0.2] >>> list_range_values = [1, 3, 5]

Firstly, some shenanigans generators:

 >>> indices_weights = ((list_list_values[i], list_comp_values[i]) for i in list_range_values) >>> flat_indices_weights = ((i, weight) for indices, weight in indices_weights for i in indices)

Now we pass the data to numpy . I cannot figure out how to create rec.array from an iterator, so I had to convert the above generator to a list. Maybe there is a way to avoid this ...

 >>> i_w = numpy.rec.array(list(flat_indices_weights), dtype=[('i', int), ('weight', float)]) >>> numpy.histogram(i_w['i'], bins=range(0, 100), weights=i_w['weight']) (array([ 0. , 0. , 0. , 0. , 5. , 9.8, 0. , 14.8, 5. , 0. , 0. , 9.8, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 11.7, 0. , 0. , 0. , 9.8, 0. , 0. , 0. , 0. , 0. , 11.7, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 11.7, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 9.8]), array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]))

I had a moment to follow JoshAdel tests with a few of my own. The fastest solution so far uses the Bago setting, but replaces the sum_by_group function sum_by_group the built-in histogram function. Here are the numbers I received (updated) :

Method 1 (jterrace): 2.65

Method2 (for the loop): 2.25

Method3 (Bago): 1.14

Method 4 (bar graph): 2.82

Method5 (3/4 combo): 1.07

Please note that, as implemented here, the first method gives incorrect results according to my test. I did not have time to find out what the problem is. The code for my test is below; it only gently corrects the source code of JoshAdel, but I am posting it here completely for convenience. (Updated to include Bago comments and a few de-kludged.)

 from timeit import Timer setstr="""import numpy as np import itertools import random Nlists = 1000 total_lists = 5000 outsz = 100 maxsublistsz = 100 # create random list of lists list_range_values = random.sample(xrange(total_lists),Nlists) list_list_values = [random.sample(xrange(outsz),np.random.randint(1,maxsublistsz)) for k in xrange(total_lists)] list_comp_values = list(10*np.random.uniform(size=(total_lists,))) v = np.zeros((outsz,)) def indices(start, end): lens = end - start np.cumsum(lens, out=lens) i = np.ones(lens[-1], dtype=int) i[0] = start[0] i[lens[:-1]] += start[1:] i[lens[:-1]] -= end[:-1] np.cumsum(i, out=i) return i def sum_by_group(values, groups): order = np.argsort(groups) groups = groups[order] values = values[order] values.cumsum(out=values) index = np.ones(len(groups), 'bool') index[:-1] = groups[1:] != groups[:-1] values = values[index] groups = groups[index] values[1:] = np.diff(values) return values, groups """ setstr_test = setstr + "\nprint_v = True\n" method1=""" list_list_lens = np.array(map(len, list_list_values)) comp_vals_expanded = np.repeat(list_comp_values, list_list_lens) list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values),dtype=int) list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1])) toadd = indices(list_list_starts[list_range_values],(list_list_starts + list_list_lens)[list_range_values]) v[list_vals_flat[toadd]] += comp_vals_expanded[toadd] """ method2=""" for k in list_range_values: v[list_list_values[k]] += list_comp_values[k] """ method3=""" llv = [np.fromiter(list_list_values[i], 'int') for i in list_range_values] lcv = [list_comp_values[i] for i in list_range_values] counts = map(len, llv) indices = np.concatenate(llv) values = np.repeat(lcv, counts) totals, indices_unique = sum_by_group(values, indices) v[indices_unique] += totals """ method4=""" indices_weights = ((list_list_values[i], list_comp_values[i]) for i in list_range_values) flat_indices_weights = ((i, weight) for indices, weight in indices_weights for i in indices) i_w = np.rec.array(list(flat_indices_weights), dtype=[('i', 'i'), ('weight', 'd')]) v += np.histogram(i_w['i'], bins=range(0, outsz + 1), weights=i_w['weight'], new=True)[0] """ method5=""" llv = [np.fromiter(list_list_values[i], 'int') for i in list_range_values] lcv = [list_comp_values[i] for i in list_range_values] counts = map(len, llv) indices = np.concatenate(llv) values = np.repeat(lcv, counts) v += np.histogram(indices, bins=range(0, outsz + 1), weights=values, new=True)[0] """ t1 = Timer(method1,setup=setstr).timeit(100) print t1 t2 = Timer(method2,setup=setstr).timeit(100) print t2 t3 = Timer(method3,setup=setstr).timeit(100) print t3 t4 = Timer(method4,setup=setstr).timeit(100) print t4 t5 = Timer(method5,setup=setstr).timeit(100) print t5 exec(setstr_test + method1 + "\nprint v\n") exec("\nv = np.zeros((outsz,))\n" + method2 + "\nprint v\n") exec("\nv = np.zeros((outsz,))\n" + method3 + "\nprint v\n") exec("\nv = np.zeros((outsz,))\n" + method4 + "\nprint v\n") exec("\nv = np.zeros((outsz,))\n" + method5 + "\nprint v\n")

+2

senderle Jan 2 '12 at 20:00

source share

First, configure the variables you specified:

 import numpy as np list_range_values = [1, 3, 5] list_list_values = [[3, 6, 7], [5, 7, 11, 25, 99], [8, 45], [4, 7, 8], [0, 1], [21, 31, 41]] list_comp_values = [0.7, 9.8, 1.2, 5, 10, 11.7] v = np.arange(100, dtype=float)

Then list_list_values and list_comp_values need to be smoothed so that they are contiguous:

 list_list_lens = np.array(map(len, list_list_values)) comp_vals_expanded = np.repeat(list_comp_values, list_list_lens) import itertools list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values), dtype=int)

Then starting indices are required for each subarray:

 list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1]))

Now that we have the start and end values, we can use the index function from this question to get an array of selector indices:

 def indices(start, end): lens = end - start np.cumsum(lens, out=lens) i = np.ones(lens[-1], dtype=int) i[0] = start[0] i[lens[:-1]] += start[1:] i[lens[:-1]] -= end[:-1] np.cumsum(i, out=i) return i toadd = indices(list_list_starts[list_range_values], (list_list_starts + list_list_lens)[list_range_values])

Now that we have done all this magic, the array can be added like this:

 v[list_vals_flat[toadd]] += comp_vals_expanded[toadd]

+1

jterrace Jan 2 '12 at 16:45

source share

Joshdel · Accepted Answer · 2012-01-02T19:02:05+0000

While this often leads to massive acceleration for exceptions for loops and the use of numpy's built-in input / vectorization functions. I would just say that this is not always the case. Timing is easy to cycle against a much more attractive vectorization, does not give you much acceleration and much more detailed. Just think:

 from timeit import Timer setstr="""import numpy as np import itertools import random Nlists = 1000 total_lists = 5000 outsz = 100 maxsublistsz = 100 # create random list of lists list_range_values = random.sample(xrange(total_lists),Nlists) list_list_values = [random.sample(xrange(outsz),np.random.randint(1,maxsublistsz)) for k in xrange(total_lists)] list_comp_values = 10*np.random.uniform(size=(total_lists,)) v = np.zeros((outsz,)) def indices(start, end): lens = end - start np.cumsum(lens, out=lens) i = np.ones(lens[-1], dtype=int) i[0] = start[0] i[lens[:-1]] += start[1:] i[lens[:-1]] -= end[:-1] np.cumsum(i, out=i) return i def sum_by_group(values, groups): order = np.argsort(groups) groups = groups[order] values = values[order] values.cumsum(out=values) index = np.ones(len(groups), 'bool') index[:-1] = groups[1:] != groups[:-1] values = values[index] groups = groups[index] values[1:] = np.diff(values) return values, groups """ method1=""" list_list_lens = np.array(map(len, list_list_values)) comp_vals_expanded = np.repeat(list_comp_values, list_list_lens) list_vals_flat = np.fromiter(itertools.chain.from_iterable(list_list_values),dtype=int) list_list_starts = np.concatenate(([0], np.cumsum(list_list_lens)[:-1])) toadd = indices(list_list_starts[list_range_values],(list_list_starts + list_list_lens)[list_range_values]) v[list_vals_flat[toadd]] += comp_vals_expanded[toadd] """ method2=""" for k in list_range_values: v[list_list_values[k]] += list_comp_values[k] """ method3=""" llv = [list_list_values[i] for i in list_range_values] lcv = [list_comp_values[i] for i in list_range_values] counts = map(len, llv) indices = np.concatenate(llv) values = np.repeat(lcv, counts) totals, indices_unique = sum_by_group(values, indices) v[indices_unique] += totals """ t1 = Timer(method1,setup=setstr).timeit(100) print t1 t2 = Timer(method2,setup=setstr).timeit(100) print t2 t3 = Timer(method3,setup=setstr).timeit(100) print t3

For a fairly large number of items in the list:

Method1: (no for -jterrace loop) 1.43 seconds

Method2: (for the loop) 4.62 seconds

Method 3: (no for cycle - bago) 2.99 seconds

For a small number of lists (changing Nlists to 10), the for loop is significantly faster than the jterrace solution:

Method1: (no for -jterrace loop) 1.05 seconds

Method2: (for the loop) 0.045 seconds

Method 3: (no for loop - bago) 0.041 seconds

This is not to bring down the @jterrace or @bago solutions, which are pretty elegant. Rather, it indicates that often, when a simple cycle loop does not perform it badly.

Exclude for-loop in Python and Numpy constructs

More articles: