Quantile / Median / 2D binning in Python

Do you know a quick / elegant Python / Scipy / Numpy solution for the following problem: You have a set of x, y coordinates with associated w values ​​(all 1D arrays). Now bin x and y on a two-dimensional grid (BINSxBINS size) and calculate the quantiles (as the median) of the w values ​​for each bin, which should ultimately result in a 2D BINSxBINS array with the required quantiles.

This is easy to do with some nested loop, but I'm sure there is a more elegant solution.

Thanks Mark

+7
source share
4 answers

This is what I came up with, I hope it will be useful. This is not necessarily cleaner or better than using a loop, but it may help you get started better.

import numpy as np bins_x, bins_y = 1., 1. x = np.array([1,1,2,2,3,3,3]) y = np.array([1,1,2,2,3,3,3]) w = np.array([1,2,3,4,5,6,7], 'float') # You can get a bin number for each point like this x = (x // bins_x).astype('int') y = (y // bins_y).astype('int') shape = [x.max()+1, y.max()+1] bin = np.ravel_multi_index([x, y], shape) # You could get the mean by doing something like: mean = np.bincount(bin, w) / np.bincount(bin) # Median is a bit harder order = bin.argsort() bin = bin[order] w = w[order] edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1 med_index = (np.r_[0, edges] + np.r_[edges, len(w)]) // 2 median = w[med_index] # But that not quite right, so maybe median2 = [np.median(i) for i in np.split(w, edges)] 

Also take a look at numpy.histogram2d

+5
source

I'm just trying to do it myself, and it looks like you need the command "scipy.stats.binned_statistic_2d", you can find the mean, mean, standard deviation or any specific function for the third parameter given by the bunkers.

I understand that this question has already been answered, but I think this is a good built-in solution.

+3
source

Thanks so much for your code. Based on this, I found the following solution to my problem (only a slight modification to your code):

 import numpy as np BINS=10 boxsize=10.0 bins_x, bins_y = boxsize/BINS, boxsize/BINS x = np.array([0,0,0,1,1,1,2,2,2,3,3,3]) y = np.array([0,0,0,1,1,1,2,2,2,3,3,3]) w = np.array([0,1,2,0,1,2,0,1,2,0,1,2], 'float') # You can get a bin number for each point like this x = (x // bins_x).astype('int') y = (y // bins_y).astype('int') shape = [BINS, BINS] bin = np.ravel_multi_index([x, y], shape) # Median order = bin.argsort() bin = bin[order] w = w[order] edges = (bin[1:] != bin[:-1]).nonzero()[0] + 1 median = [np.median(i) for i in np.split(w, edges)] #construct BINSxBINS matrix with median values binvals=np.unique(bin) medvals=np.zeros([BINS*BINS]) medvals[binvals]=median medvals=medvals.reshape([BINS,BINS]) print medvals 
+1
source

With numpy / scipy it looks like this:

  import numpy as np import scipy.stats as stats x = np.random.uniform(0,200,100) y = np.random.uniform(0,200,100) w = np.random.uniform(1,10,100) h = np.histogram2d(x,y,bins=[10,10], weights=w,range=[[0,200],[0,200]]) hist, bins_x, bins_y = h q = stats.mstats.mquantiles(hist,prob=[0.25, 0.5, 0.75]) >>> q.round(2) array([ 512.8 , 555.41, 592.73]) q1 = np.where(hist<q[0],1,0) q2 = np.where(np.logical_and(q[0]<=hist,hist<q[1]),2,0) q3 = np.where(np.logical_and(q[1]<=hist,hist<=q[2]),3,0) q4 = np.where(q[2]<hist,4,0) >>>q1 + q2 + q3 + q4 array([[4, 3, 4, 3, 1, 1, 4, 3, 1, 2], [1, 1, 4, 4, 2, 3, 1, 3, 3, 3], [2, 3, 3, 2, 2, 2, 3, 2, 4, 2], [2, 2, 3, 3, 3, 1, 2, 2, 1, 4], [1, 3, 1, 4, 2, 1, 3, 1, 1, 3], [4, 2, 2, 1, 2, 1, 3, 2, 1, 1], [4, 1, 1, 3, 1, 3, 4, 3, 2, 1], [4, 3, 1, 4, 4, 4, 1, 1, 2, 4], [2, 4, 4, 4, 3, 4, 2, 2, 2, 4], [2, 2, 4, 4, 3, 3, 1, 3, 4, 4]]) 

prob = [0.25, 0.5, 0.75] - the default value for the quantile parameters, you can change it or leave it.

0
source

All Articles