Matplotlib normalized histograms

I am trying to draw part of a histogram using matplotlib.

Instead of drawing the entire histogram with a lot of outliers and big values, I want to focus on only a small part. The original histogram looks like this:

hist(data, bins=arange(data.min(), data.max(), 1000), normed=1, cumulative=False) plt.ylabel("PDF") 

enter image description here

And after focusing it looks like this:

 hist(data, bins=arange(0, 121, 1), normed=1, cumulative=False) plt.ylabel("PDF") 

enter image description here

Note that the last bit is stretched and the worst of all ticks Y is scaled so that the sum is 1 (therefore, points from the current range are not taken into account at all)

I know that I can achieve what I want by drawing a histogram over the entire possible range, and then limiting the axis to the part that is interesting to me, but it spends a lot of time counting boxes that I will not use / anyway.

 hist(btsd-40, bins=arange(btsd.min(), btsd.max(), 1), normed=1, cumulative=False) axis([0,120,0,0.0025]) 

enter image description here

Is there a quick and easy way to draw only a focused area, but still get the right Y scale?

+6
source share
2 answers

To build a subset of the histogram, I don't think you can come close to calculating the entire histogram.

Did you numpy.histogram calculate the histogram using numpy.histogram and then display the area with pylab.plot or something else? I.e.

 import numpy as np import pylab as plt data = np.random.normal(size=10000)*10000 plt.figure(0) plt.hist(data, bins=np.arange(data.min(), data.max(), 1000)) plt.figure(1) hist1 = np.histogram(data, bins=np.arange(data.min(), data.max(), 1000)) plt.bar(hist1[1][:-1], hist1[0], width=1000) plt.figure(2) hist2 = np.histogram(data, bins=np.arange(data.min(), data.max(), 200)) mask = (hist2[1][:-1] < 20000) * (hist2[1][:-1] > 0) plt.bar(hist2[1][mask], hist2[0][mask], width=200) 

Original Bar Chart: Original histogram

The histogram is calculated manually: Histogram calculated manually

The histogram is calculated manually, trimmed: Histogram calculated manually, cropped (NB: the values ​​are smaller because the cells are narrower)

+5
source

I think you can normalize your data with a given weight. ( repeat is a numpy function).

hist(data, bins=arange(0, 121, 1), weights=repeat(1.0/len(data), len(data)))

+4
source

Source: https://habr.com/ru/post/924644/


All Articles