Why does numpy std () give a different result for matlab std ()?

Question

Why does numpy std () give a different result for matlab std ()?

I am trying to convert matlab code to numpy and found out that numpy has a different result using the std function.

in matlab

std([1,3,4,6]) ans = 2.0817

in numpy

 np.std([1,3,4,6]) 1.8027756377319946

This is normal? And how do I do this?

+64

python numpy matlab standard-deviation

gustavgans Dec 22 '14 at 9:52

source share

3 answers

The standard deviation is the square root of the variance. The variance of a random variable X is defined as

definition of variance

Thus, the estimate for the variance will be

biased estimator

Where sample mean indicates the average value of the sample. For random selection it can be shown that this estimate does not converge to real variance, but to

unbiased estimator

If you randomly select samples and estimate the average sample value and variance, you will have to use a corrected (objective) estimate

unbiased estimator

which converges to sigma squared . Correction member n-1 also called Bessel correction.

Now, by default, MATLABs std computes an unbiased estimate with the correction term n-1 . However, NumPy (as explained by @ajcr) computes the biased estimate without the default correction condition. The ddof parameter allows ddof to set any n-ddof correction member. By setting it to 1, you will get the same result as in MATLAB.

Similarly, MATLAB allows you to add a second parameter w , which defines a "weighting scheme". The default value w=0 leads to the correction term n-1 (unbiased estimate), while for w=1 , only n is used as the correction term (biased estimate).

+49

hbaderts Dec 22 '14 at 10:55

source share

For people who are not great with statistics, a simplified guide:

Include ddof=1 if you are calculating np.std() for a sample taken from your complete dataset.
Provide ddof=0 if you are calculating np.std() for the entire population

DDOF is included for samples to balance the bias that may occur in numbers.

+1

MJM Jun 14 '17 at 9:42 on

source share

Alex Riley · Accepted Answer · 2014-12-22 09:54

The NumPy np.std function accepts the optional ddof parameter: Delta Degrees of Freedom. The default is 0 . Set it to 1 to get MATLAB result:

 >>> np.std([1,3,4,6], ddof=1) 2.0816659994661326

To add a little more context, when calculating the variance (of which the standard deviation is the square root), we usually divide by the number of values that we have.

But if we select a random sample of N elements from a larger distribution and calculate the variance, dividing by N may underestimate the actual variance. To fix this, we can reduce the number we divide by ( degrees of freedom ) by a number less than N (usually N-1 ). The ddof parameter allows us to change the divisor by the specified amount.

Unless otherwise indicated, NumPy will calculate the biased estimate for the variance ( ddof=0 , dividing by N ). This is what you want if you are working with the entire distribution (and not a subset of values that were accidentally selected from a larger distribution). If the ddof parameter is ddof , NumPy instead divides by N - ddof .

The default behavior of MATLAB std is to correct bias for sample variance by dividing by N-1 . This eliminates some (but probably not all) biases in the standard deviation. This will probably be what you want if you use the function on a larger random sample.

The good answer @hbaderts gives further mathematical details.

Why does numpy std () give a different result for matlab std ()?

More articles: