Find bin averages using python histogram2d

Question

Find bin averages using python histogram2d

How do you calculate averages for bins with a 2D histogram in python? I have temperature ranges for the x and y axis, and I'm trying to build the probability of lightning using bins for the corresponding temperatures. I am reading data from a csv file and my code is this:

filename = 'Random_Events_All_Sorted_85GHz.csv' df = pd.read_csv(filename) min37 = df.min37 min85 = df.min85 verification = df.five_min_1 #Numbers x = min85 y = min37 H = verification #Estimate the 2D histogram nbins = 4 H, xedges, yedges = np.histogram2d(x,y,bins=nbins) #Rotate and flip H H = np.rot90(H) H = np.flipud(H) #Mask zeros Hmasked = np.ma.masked_where(H==0,H) #Plot 2D histogram using pcolor fig1 = plt.figure() plt.pcolormesh(xedges,yedges,Hmasked) plt.xlabel('min 85 GHz PCT (K)') plt.ylabel('min 37 GHz PCT (K)') cbar = plt.colorbar() cbar.ax.set_ylabel('Probability of Lightning (%)') plt.show()

This makes a nice looking plot, but the data that is built is the number or number of samples that fall into each bit. The verification variable is an array containing 1 and 0, where 1 indicates lightning and 0 indicates the absence of lightning. I want the data on the chart to be a lightning probability for a given bin based on the data from the verification variable, so I need bin_mean * 100 to get this percentage.

I tried using an approach similar to what is shown here ( binning data in python with scipy / numpy ), but it was difficult for me to get it working on a 2D histogram.

+2

python numpy scipy matplotlib

mbreezy Jul 23 '14 at 18:03

source share

2 answers

There is an elegant and quick way to do it! Use the weights parameter to summarize values:

 denominator, xedges, yedges = np.histogram2d(x,y,bins=nbins) nominator, _, _ = np.histogram2d(x,y,bins=[xedges, yedges], weights=verification)

So, all you need is to divide the sum of values in each box by the number of events:

 result = nominator / denominator

Voila!

+5

Alleo Jan 03 '15 at 0:01

source share

DrV · Accepted Answer · 2014-07-24 08:06

This is doable, at least with the following method

 # xedges, yedges as returned by 'histogram2d' # create an array for the output quantities avgarr = np.zeros((nbins, nbins)) # determine the X and Y bins each sample coordinate belongs to xbins = np.digitize(x, xedges[1:-1]) ybins = np.digitize(y, yedges[1:-1]) # calculate the bin sums (note, if you have very many samples, this is more # effective by using 'bincount', but it requires some index arithmetics for xb, yb, v in zip(xbins, ybins, verification): avgarr[yb, xb] += v # replace 0s in H by NaNs (remove divide-by-zero complaints) # if you do not have any further use for H after plotting, the # copy operation is unnecessary, and this will the also take care # of the masking (NaNs are plotted transparent) divisor = H.copy() divisor[divisor==0.0] = np.nan # calculate the average avgarr /= divisor # now 'avgarr' contains the averages (NaNs for no-sample bins)

If you know the edges of the bin in advance, you can make part of the histogram the same by adding one line.

Find bin averages using python histogram2d

More articles: