Matplotlib normalized histograms

Question

Matplotlib normalized histograms

I am trying to draw part of a histogram using matplotlib.

Instead of drawing the entire histogram with a lot of outliers and big values, I want to focus on only a small part. The original histogram looks like this:

hist(data, bins=arange(data.min(), data.max(), 1000), normed=1, cumulative=False) plt.ylabel("PDF")

enter image description here

And after focusing it looks like this:

 hist(data, bins=arange(0, 121, 1), normed=1, cumulative=False) plt.ylabel("PDF")

enter image description here

Note that the last bit is stretched and the worst of all ticks Y is scaled so that the sum is 1 (therefore, points from the current range are not taken into account at all)

I know that I can achieve what I want by drawing a histogram over the entire possible range, and then limiting the axis to the part that is interesting to me, but it spends a lot of time counting boxes that I will not use / anyway.

 hist(btsd-40, bins=arange(btsd.min(), btsd.max(), 1), normed=1, cumulative=False) axis([0,120,0,0.0025])

Is there a quick and easy way to draw only a focused area, but still get the right Y scale?

+6

python matplotlib histogram

cdecker Sep 05 '12 at 14:37

source share

2 answers

I think you can normalize your data with a given weight. ( repeat is a numpy function).

hist(data, bins=arange(0, 121, 1), weights=repeat(1.0/len(data), len(data)))

+4

Sunhwan jo Sep 05 '12 at 15:34

source share

Tim · Accepted Answer · 2012-09-05T14:50:24+0000

To build a subset of the histogram, I don't think you can come close to calculating the entire histogram.

Did you numpy.histogram calculate the histogram using numpy.histogram and then display the area with pylab.plot or something else? I.e.

 import numpy as np import pylab as plt data = np.random.normal(size=10000)*10000 plt.figure(0) plt.hist(data, bins=np.arange(data.min(), data.max(), 1000)) plt.figure(1) hist1 = np.histogram(data, bins=np.arange(data.min(), data.max(), 1000)) plt.bar(hist1[1][:-1], hist1[0], width=1000) plt.figure(2) hist2 = np.histogram(data, bins=np.arange(data.min(), data.max(), 200)) mask = (hist2[1][:-1] < 20000) * (hist2[1][:-1] > 0) plt.bar(hist2[1][mask], hist2[0][mask], width=200)

Original Bar Chart:

The histogram is calculated manually:

The histogram is calculated manually, trimmed: (NB: the values are smaller because the cells are narrower)

Matplotlib normalized histograms

More articles: