Well, after some searches, I cannot find an SO question that directly solves this. I was looking for masked arrays, and although they seem cool, I'm not sure if I need them.
consider 2 numpy arrays:
zone_data is an array with 2 num characters with elements with the same value. These are my "zones".
value_data is an array with two values โโ(the exact form of zone_data) with arbitrary values.
I am looking for a numpy array of the same shape as zone_data / value_data, which has the average values โโof each zone instead of zone numbers.
example ... in the form of ascii art.
zone_data (4 different zones):
1, 1, 2, 2 1, 1, 2, 2 3, 3, 4, 4 3, 4, 4, 4
value_data :
1, 2, 3, 6 3, 0, 2, 5 1, 1, 1, 0 2, 4, 2, 1
my result, name it result_data :
1.5, 1.5, 4.0, 4.0 1.5, 1.5, 4.0, 4.0 2.0, 2.0, 1.0, 1.0 2.0, 2.0, 1.0, 1.0
here is the code i have. It works great, which gives me a great result.
result_data = np.zeros(zone_data.shape) for i in np.unique(zone_data): result_data[zone_data == i] = np.mean(value_data[zone_data == i])
My arrays are large, and a piece of code takes a few seconds. I think that I have a knowledge gap and I have not found anything useful. The loop aspect should be delegated to the library or something else ... aarg!
I ask for help to do it QUICKLY! Gods of Python, I seek your wisdom!
EDIT - Adding a Script Reference
import numpy as np import time zones = np.random.randint(1000, size=(2000,1000)) values = np.random.rand(2000,1000) print 'start method 1:' start_time = time.time() result_data = np.zeros(zones.shape) for i in np.unique(zones): result_data[zones == i] = np.mean(values[zones == i]) print 'done method 1 in %.2f seconds' % (time.time() - start_time) print print 'start method 2:' start_time = time.time()
my conclusion:
start method 1: done method 1 in 4.34 seconds start method 2: done method 2 in 0.00 seconds