Decreasing units in consecutive non-contiguous slices

Question

Decreasing units in consecutive non-contiguous slices

Let's say I have two numpy arrays, A form (d, f) and I form (d,) containing indices at 0..n , for example.

 I = np.array([0, 0, 1, 0, 2, 1]) A = np.arange(12).reshape(6, 2)

I am looking for a quick way to make abbreviations, in particular sum , mean and max , across all slices A[I == i, :] ; slow version will be

 results = np.zeros((I.max() + 1, A.shape[1])) for i in np.unique(I): results[i, :] = np.mean(A[I == i, :], axis=0)

which gives in this case

 results = [[ 2.66666667, 3.66666667], [ 7. , 8. ], [ 8. , 9. ]])

EDIT : I did some timings based on Divacar's answer and previous pandas answer (deleted).

Time Code:

 from __future__ import division, print_function import numpy as np, pandas as pd from time import time np.random.seed(0) d = 500000 f = 500 n = 500 I = np.hstack((np.arange(n), np.random.randint(n, size=(d - n,)))) np.random.shuffle(I) A = np.random.rand(d, f) def reduce_naive(A, I, op="avg"): target_dtype = (np.float if op=="avg" else A.dtype) results = np.zeros((I.max() + 1, A.shape[1]), dtype=target_dtype) npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op) for i in np.unique(I): results[i, :] = npop(A[I == i, :], axis=0) return results def reduce_reduceat(A, I, op="avg"): sidx = I.argsort() sI = I[sidx] sortedA = A[sidx] idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ] if op == "max": return np.maximum.reduceat(sortedA, idx, axis=0) sums = np.add.reduceat(sortedA, idx, axis=0) if op == "sum": return sums if op == "avg": count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]] return sums/count.astype(float)[:,None] def reduce_bincount(A, I, op="avg"): ids = (I[:,None] + (I.max()+1)*np.arange(A.shape[1])).ravel() sums = np.bincount(ids, A.ravel()).reshape(A.shape[1],-1).T if op == "sum": return sums if op == "avg": return sums/np.bincount(ids).reshape(A.shape[1],-1).T def reduce_pandas(A, I, op="avg"): group = pd.concat([pd.DataFrame(A), pd.DataFrame(I, columns=("i",)) ], axis=1 ).groupby('i') if op == "sum": return group.sum().values if op == "avg": return group.mean().values if op == "max": return group.max().values def reduce_hybrid(A, I, op="avg"): sidx = I.argsort() sI = I[sidx] sortedA = A[sidx] idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ] unq_sI = sI[idx] m = I.max()+1 N = A.shape[1] target_dtype = (np.float if op=="avg" else A.dtype) out = np.zeros((m,N),dtype=target_dtype) ss_idx = np.r_[idx,A.shape[0]] npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op) for i in range(len(idx)): out[unq_sI[i]] = npop(sortedA[ss_idx[i]:ss_idx[i+1]], axis=0) return out for op in ("sum", "avg", "max"): for name, method in (("naive ", reduce_naive), ("reduceat", reduce_reduceat), ("pandas ", reduce_pandas), ("bincount", reduce_bincount), ("hybrid ", reduce_hybrid) ("numba ", reduce_numba) ): if op == "max" and name == "bincount": continue # if name is not "naive": # assert np.allclose(method(A, I, op), reduce_naive(A, I, op)) times = [] for tries in range(3): time0 = time(); method(A, I, op) times.append(time() - time0); print(name, op, "{:.2f}".format(np.min(times))) print()

Timings:

 naive sum 1.10 reduceat sum 4.62 pandas sum 5.29 bincount sum 1.54 hybrid sum 0.62 numba sum 0.31 naive avg 1.12 reduceat avg 4.45 pandas avg 5.23 bincount avg 2.43 hybrid avg 0.61 numba avg 0.33 naive max 1.19 reduceat max 3.18 pandas max 5.24 hybrid max 0.72 numba max 0.34

(I chose d and n as typical values for my use case - I added code for numba versions in my answer).

+7

python arrays max vectorization numpy

Maxim Mar 17 '17 at 16:55

source share

3 answers

Using the python / numpy jit Numba compiler I was able to get shorter timings with instant compilation of an intuitive linear algorithm:

 from numba import jit @jit def reducenb_avg(A, I): d, f = A.shape n = I.max() + 1 result = np.zeros((n, f), np.float) count = np.zeros((n, 1), int) for i in range(d): result[I[i], :] += A[i] count[I[i], 0] += 1 return result/count @jit def reducenb_sum(A, I): d, f = A.shape n = I.max() + 1 result = np.zeros((n, f), A.dtype) for i in range(d): result[I[i], :] += A[i] return result @jit def reducenb_max(A, I): d, f = A.shape n = I.max() + 1 result = -np.inf * np.ones((n, f)) count = np.zeros((n, f)) for i in range(d): result[I[i], :] = np.maximum(A[i], result[I[i], :]) return result def reduce_numba(A, I, op="avg"): return {"sum": reducenb_sum, "avg": reducenb_avg, "max": reducenb_max}.get(op)(A, I)

In the control problem, they end in ~ 0.32 s, in about half the cases of pure numpy-based sorting methods.

+2

Maxim Mar 17 '17 at 23:22

source share

Another tool that can be used for this is the unbuffered add.at :

 def add_at(I,A): n = I.max() + 1 res = np.zeros((n,A.shape[1])) cnt = np.zeros((n,1)) np.add.at(res, I, A) np.add.at(cnt, I, 1) return res/cnt

(It is pretty close in numba reducenb_avg structure)

 In [438]: add_at(I,A) Out[438]: array([[ 2.66666667, 3.66666667], [ 7. , 8. ], [ 8. , 9. ]])

For this small problem, it is tested well compared to others, but it does not scale well (switching from 3x faster to 12x slower).

0

hpaulj Mar 18 '17 at 1:26

source share

Divakar · Accepted Answer · 2017-03-17T17:08:44+0000

Approach # 1: Using NumPy ufunc reduceat

We have ufuncs for these three contraction operations, fortunately we also have ufunc.reduceat to perform these contractions, limited at specific intervals along the axis. Thus, using these, we would calculate these three operations as follows:

 # Gives us sorted array based on input indices I and indices at which the # sorted array should be interval-limited for reduceat operations to be # applied later on using those results def sorted_array_intervals(A, I): # Compute sort indices for I. To be later used for sorting A based on it. sidx = I.argsort() sI = I[sidx] sortedA = A[sidx] # Get indices at which intervals change. Also, get count in each interval idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ] return sortedA, idx # Groupby sum reduction using the interval indices # to perform interval-limited ufunc reductions def groupby_sum(A, I): sortedA, idx = sorted_array_intervals(A,I) return np.add.reduceat(sortedA, idx, axis=0) # Groupby mean reduction def groupby_mean(A, I): sortedA, idx = sorted_array_intervals(A,I) sums = np.add.reduceat(sortedA, idx, axis=0) count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]] return sums/count.astype(float)[:,None] # Groupby max reduction def groupby_max(A, I): sortedA, idx = sorted_array_intervals(A,I) return np.maximum.reduceat(sortedA, idx, axis=0)

Thus, if we need all these operations, we can reuse a single instance of sorted_array_intervals , for example:

 def groupby_sum_mean_max(A, I): sortedA, idx = sorted_array_intervals(A,I) sums = np.add.reduceat(sortedA, idx, axis=0) count = np.r_[idx[1:] - idx[:-1], A.shape[0] - idx[-1]] avgs = sums/count.astype(float)[:,None] maxs = np.maximum.reduceat(sortedA, idx, axis=0) return sums, avgs, maxs

Approach No. 1-B: Hybrid Version (Sort + Slice + Reduce)

Here, the hybrid version, which takes help from sorted_array_intervals , gets a sorted array and indices in which the intervals change to the next group, but at the last stage uses slicing to sum each interval and does iteratively for each group. Slicing helps here when we work with views .

The implementation will look something like this:

 def reduce_hybrid(A, I, op="avg"): sidx = I.argsort() sI = I[sidx] sortedA = A[sidx] # Get indices at which intervals change. Also, get count in each interval idx = np.r_[ 0, np.flatnonzero(sI[1:] != sI[:-1])+1 ] unq_sI = sI[idx] m = I.max()+1 N = A.shape[1] target_dtype = (np.float if op=="avg" else A.dtype) out = np.zeros((m,N),dtype=target_dtype) ss_idx = np.r_[idx,A.shape[0]] npop = {"avg": np.mean, "sum": np.sum, "max": np.max}.get(op) for i in range(len(idx)): out[unq_sI[i]] = npop(sortedA[ss_idx[i]:ss_idx[i+1]], axis=0) return out

Runtime test (using the setting from the tests marked in the question) -

 In [432]: d = 500000 ...: f = 500 ...: n = 500 ...: I = np.hstack((np.arange(n), np.random.randint(n, size=(d - n,)))) ...: np.random.shuffle(I) ...: A = np.random.rand(d, f) ...: In [433]: %timeit reduce_naive(A, I, op="sum") ...: %timeit reduce_hybrid(A, I, op="sum") ...: 1 loops, best of 3: 1.03 s per loop 1 loops, best of 3: 549 ms per loop In [434]: %timeit reduce_naive(A, I, op="avg") ...: %timeit reduce_hybrid(A, I, op="avg") ...: 1 loops, best of 3: 1.04 s per loop 1 loops, best of 3: 550 ms per loop In [435]: %timeit reduce_naive(A, I, op="max") ...: %timeit reduce_hybrid(A, I, op="max") ...: 1 loops, best of 3: 1.14 s per loop 1 loops, best of 3: 631 ms per loop

Approach # 2: using NumPy bincount

Here's another approach using np.bincount that does bin-based summation. Thus, we could calculate the total and average values, and also avoid sorting in the process, for example:

 ids = (I[:,None] + (I.max()+1)*np.arange(A.shape[1])).ravel() sums = np.bincount(ids, A.ravel()).reshape(A.shape[1],-1).T avgs = sums/np.bincount(ids).reshape(A.shape[1],-1).T

Decreasing units in consecutive non-contiguous slices

More articles: