NumPy: Selecting and Summing Data into an Array

I have a (large) array of data and a (large) list of lists of (multiple) indexes, e.g.

data = [1.0, 10.0, 100.0]
contribs = [[1, 2], [0], [0, 1]]

For each entry in, contribsI would like to summarize the corresponding values dataand put them in an array. In the above example, the expected result would be

out = [110.0, 1.0, 11.0]

Doing this in a loop works,

c = numpy.zeros(len(contribs))
for k, indices in enumerate(contribs):
    for idx in indices:
        c[k] += data[idx]

but since datathey are contribslarge, it takes too much time.

I have a feeling that this can be improved using numpy fancy indexing.

Any clues?

+4
source share
3 answers

One of the possibilities is

data = np.array(data)
out = [np.sum(data[c]) for c in contribs]

Should be faster than a double loop, at least.

+5
source

* -

# Get lengths of list element in contribs and the cumulative lengths
# to be used for creating an ID array later on.
clens = np.cumsum([len(item) for item in contribs])

# Setup ID array that corresponds to same ID for same list element in contribs.
# These IDs would be used to accumulate values from a corresponnding array
#  that is created by indexing into data array with a flattened contribs
id_arr = np.zeros(clens[-1],dtype=int)
id_arr[clens[:-1]] = 1
out = np.bincount(id_arr.cumsum(),np.take(data,np.concatenate(contribs)))

. , contribs, .

* , , , , , . , , .

+2

, , , data numpy.array:

# Flatten "contribs"
f = [j for i in contribs for j in i]

# Get the "ranges" of data[f] that will be summed in the next step
i = [0] + numpy.cumsum([len(i) for i in contribs]).tolist()[:-1]

# Take the required sums
numpy.add.reduceat(data[f], i)
0

All Articles