Python numpy mask

Question

Python numpy mask

Well, after some searches, I cannot find an SO question that directly solves this. I was looking for masked arrays, and although they seem cool, I'm not sure if I need them.

consider 2 numpy arrays:

zone_data is an array with 2 num characters with elements with the same value. These are my "zones".

value_data is an array with two values (the exact form of zone_data) with arbitrary values.

I am looking for a numpy array of the same shape as zone_data / value_data, which has the average values of each zone instead of zone numbers.

example ... in the form of ascii art.

zone_data (4 different zones):

 1, 1, 2, 2 1, 1, 2, 2 3, 3, 4, 4 3, 4, 4, 4

value_data :

 1, 2, 3, 6 3, 0, 2, 5 1, 1, 1, 0 2, 4, 2, 1

my result, name it result_data :

 1.5, 1.5, 4.0, 4.0 1.5, 1.5, 4.0, 4.0 2.0, 2.0, 1.0, 1.0 2.0, 2.0, 1.0, 1.0

here is the code i have. It works great, which gives me a great result.

 result_data = np.zeros(zone_data.shape) for i in np.unique(zone_data): result_data[zone_data == i] = np.mean(value_data[zone_data == i])

My arrays are large, and a piece of code takes a few seconds. I think that I have a knowledge gap and I have not found anything useful. The loop aspect should be delegated to the library or something else ... aarg!

I ask for help to do it QUICKLY! Gods of Python, I seek your wisdom!

EDIT - Adding a Script Reference

 import numpy as np import time zones = np.random.randint(1000, size=(2000,1000)) values = np.random.rand(2000,1000) print 'start method 1:' start_time = time.time() result_data = np.zeros(zones.shape) for i in np.unique(zones): result_data[zones == i] = np.mean(values[zones == i]) print 'done method 1 in %.2f seconds' % (time.time() - start_time) print print 'start method 2:' start_time = time.time() #your method here! print 'done method 2 in %.2f seconds' % (time.time() - start_time)

my conclusion:

 start method 1: done method 1 in 4.34 seconds start method 2: done method 2 in 0.00 seconds

+5

performance python arrays numpy masking

user1269942 Jan 16 '15 at 23:45

source share

2 answers

I thought I saw it in my cheekbone somewhere, but I can no longer find it. Have you looked there?

In any case, you can get the first improvement by changing your loop:

 result_data = np.empty(zones.shape) # minor speed gain for label in np.unique(zones): mask = zones==label result_data[mask] = np.mean(values[mask])

This way you do not need to do a logical comparison twice. This will slightly reduce the execution time.

+1

Oliver W. Jan 17 '15 at 15:37

source share

DSM · Accepted Answer · 2015-01-17T16:09:51+0000

You can use np.bincount :

 count = np.bincount(zones.flat) tot = np.bincount(zones.flat, weights=values.flat) avg = tot/count result_data2 = avg[zones]

which gives me

 start method 1: done method 1 in 3.13 seconds start method 2: done method 2 in 0.01 seconds >>> >>> np.allclose(result_data, result_data2) True

Python numpy mask

More articles: