Find bin averages using python histogram2d

How do you calculate averages for bins with a 2D histogram in python? I have temperature ranges for the x and y axis, and I'm trying to build the probability of lightning using bins for the corresponding temperatures. I am reading data from a csv file and my code is this:

filename = 'Random_Events_All_Sorted_85GHz.csv' df = pd.read_csv(filename) min37 = df.min37 min85 = df.min85 verification = df.five_min_1 #Numbers x = min85 y = min37 H = verification #Estimate the 2D histogram nbins = 4 H, xedges, yedges = np.histogram2d(x,y,bins=nbins) #Rotate and flip H H = np.rot90(H) H = np.flipud(H) #Mask zeros Hmasked = np.ma.masked_where(H==0,H) #Plot 2D histogram using pcolor fig1 = plt.figure() plt.pcolormesh(xedges,yedges,Hmasked) plt.xlabel('min 85 GHz PCT (K)') plt.ylabel('min 37 GHz PCT (K)') cbar = plt.colorbar() cbar.ax.set_ylabel('Probability of Lightning (%)') plt.show() 

This makes a nice looking plot, but the data that is built is the number or number of samples that fall into each bit. The verification variable is an array containing 1 and 0, where 1 indicates lightning and 0 indicates the absence of lightning. I want the data on the chart to be a lightning probability for a given bin based on the data from the verification variable, so I need bin_mean * 100 to get this percentage.

I tried using an approach similar to what is shown here ( binning data in python with scipy / numpy ), but it was difficult for me to get it working on a 2D histogram.

+2
python numpy scipy matplotlib
Jul 23 '14 at 18:03
source share
2 answers

This is doable, at least with the following method

 # xedges, yedges as returned by 'histogram2d' # create an array for the output quantities avgarr = np.zeros((nbins, nbins)) # determine the X and Y bins each sample coordinate belongs to xbins = np.digitize(x, xedges[1:-1]) ybins = np.digitize(y, yedges[1:-1]) # calculate the bin sums (note, if you have very many samples, this is more # effective by using 'bincount', but it requires some index arithmetics for xb, yb, v in zip(xbins, ybins, verification): avgarr[yb, xb] += v # replace 0s in H by NaNs (remove divide-by-zero complaints) # if you do not have any further use for H after plotting, the # copy operation is unnecessary, and this will the also take care # of the masking (NaNs are plotted transparent) divisor = H.copy() divisor[divisor==0.0] = np.nan # calculate the average avgarr /= divisor # now 'avgarr' contains the averages (NaNs for no-sample bins) 

If you know the edges of the bin in advance, you can make part of the histogram the same by adding one line.

+1
Jul 24 '14 at 8:06
source share

There is an elegant and quick way to do it! Use the weights parameter to summarize values:

 denominator, xedges, yedges = np.histogram2d(x,y,bins=nbins) nominator, _, _ = np.histogram2d(x,y,bins=[xedges, yedges], weights=verification) 

So, all you need is to divide the sum of values ​​in each box by the number of events:

 result = nominator / denominator 

Voila!

+5
Jan 03 '15 at 0:01
source share



All Articles