Distribution normality test in python

I have some data that I selected from a radar satellite image and wanted to perform some statistical tests. Before that, I wanted to conduct a normality test, so I could be sure that my data was usually disseminated. My data is usually distributed, but when I run the Im test, getting a value of P 0, assuming that my data is usually not distributed.

I attached my code along with the output and the histogram of the distribution (Im relatively new to python, so I apologize if my code is clumsy in any way). Can someone tell me if I am doing something wrong - it is difficult for me to believe from my histogram that my data is usually not disseminated?

values = 'inputfile.h5' f = h5py.File(values,'r') dset = f['/DATA/DATA'] array = dset[...,0] print('normality =', scipy.stats.normaltest(array)) max = np.amax(array) min = np.amin(array) histo = np.histogram(array, bins=100, range=(min, max)) freqs = histo[0] rangebins = (max - min) numberbins = (len(histo[1])-1) interval = (rangebins/numberbins) newbins = np.arange((min), (max), interval) histogram = bar(newbins, freqs, width=0.2, color='gray') plt.show() 

Will print this: (41099.095955202931, 0.0). the first element is the chi-square value, and the second is the p value.

I made a graph of the data that I attached. I thought that maybe as Im dealing with negative values, it causes a problem, so I normalize the values, but the problem persists.

histogram of values ​​in array

+6
source share
2 answers

In general, when the number of samples is less than 50, you should be careful using normality tests. Since these tests require enough evidence to reject the null hypothesis, which is the β€œdata distribution is normal,” and when the number of samples is small, they cannot find this evidence.

Keep in mind that when you cannot reject the null hypothesis, this does not mean that the alternative hypothesis is true.

There is another possibility: Some implementations of statistical tests for normality compare the distribution of your data with the standard normal distribution. To avoid this, I suggest you standardize the data, and then apply the normality test.

+2
source

This question explains why you get such a small p value. In fact, normality tests almost always reject zero for very large sample sizes (for example, in yours, you can only see some skew on the left side, which is more than enough for your huge sample size).

What would be much more practical in your case is to build a normal curve that matches your data. Then you can see how the normal curve really differs (for example, you can see if the tail on the left side is too long). For instance:

 from matplotlib import pyplot as plt import matplotlib.mlab as mlab n, bins, patches = plt.hist(array, 50, normed=1) mu = np.mean(array) sigma = np.std(array) plt.plot(bins, mlab.normpdf(bins, mu, sigma)) 

(Note the argument normed=1 : this ensures that the histogram is normalized to have a total area of ​​1, which makes it comparable to a density similar to a normal distribution).

+7
source

All Articles