How can I calculate the variance of a list in python?

If I have a list like this:

results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097] 

I want to calculate the variance of this list in Python, which is the mean square of the differences from the mean.

How can i do this? Access to the items in the list for performing calculations is confusing for the square differences.

+14
python list statistics variance
source share
8 answers

You can use the numpy var built-in function:

 import numpy as np results = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097] print(np.var(results)) 

This gives you 28.822364260579157

If - for some reason - you cannot use numpy and / or you do not want to use the built-in function for it, you can also calculate it "manually" using, for example, list comprehension :

 # calculate mean m = sum(results) / len(results) # calculate variance using a list comprehension var_res = sum((xi - m) ** 2 for xi in results) / len(results) 

which gives you an identical result.

If you are interested in standard deviation , you can use numpy.std :

 print(np.std(results)) 5.36864640860051 

@ Serge Ballesta very well explained the difference between the variance of n and n-1 . In numpy, you can easily set this parameter using the ddof option; the default is 0 , so for case n-1 you can just do:

 np.var(results, ddof=1) 

The freehand solution is provided in @Serge Ballesta's answer .

Both approaches give 32.024849178421285 .

You can set the parameter also for std :

 np.std(results, ddof=1) 5.659050201086865 
+39
source share

Well, there are two ways to determine variance. You have the variance n that you use when you have the full set, and the variance n-1 that you use when you have the sample.

The difference between 2 is whether the value m = sum(xi) / n real average value or whether it is just an approximation to the average value.

Example 1: you want to know the average height of students in the class and its variance: ok, the value m = sum(xi) / n is the real average, and the formulas given by Cleb are ok (variance n).

Example 2: you want to know the average hour from which the bus passes at the bus stop and its dispersion. You mark the hour for a month and get 30 values. Here the value m = sum(xi) / n is only an approximation of the real average and that the approximation will be more accurate with a large number of values. In this case, the best approximation for the actual variance is the variance n-1

 varRes = sum([(xi - m)**2 for xi in results]) / (len(results) -1) 

Well, this has nothing to do with Python, but it has an effect on statistical analysis, and the question is marked statistics and variance

Note. Typically, statistical libraries such as numpy use variance n for what they call var or variance , and variance n-1 for a function that gives standard deviation.

+7
source share

Starting with Python 3.4 , the standard library comes with the variance function (sample variance or variance n-1) as part of the statistics module:

 from statistics import variance # data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097] variance(data) # 32.024849178421285 

P opulation variance (or variance n) can be obtained using the pvariance function:

 from statistics import pvariance # data = [-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097] pvariance(data) # 28.822364260579157 

Also note that if you already know the average value of your list, the variance and pvariance take a second argument ( xbar and mu , respectively) to save recalculation of the average value of the sample (which is part of the variance calculation).

+2
source share

Numpy is really the most elegant and fastest way to do this.

I think the actual question was about how to access individual elements of the list in order to do this calculation yourself, so below is an example:

 results=[-14.82381293, -0.29423447, -13.56067979, -1.6288903, -0.31632439, 0.53459687, -1.34069996, -1.61042692, -4.03220519, -0.24332097] import numpy as np print 'numpy variance: ', np.var(results) # without numpy by hand # there are two ways of calculating the variance # - 1. direct as central 2nd order moment (https://en.wikipedia.org/wiki/Moment_(mathematics))divided by the length of the vector # - 2. "mean of square minus square of mean" (see https://en.wikipedia.org/wiki/Variance) # calculate mean n= len(results) sum=0 for i in range(n): sum = sum+ results[i] mean=sum/n print 'mean: ', mean # calculate the central moment sum2=0 for i in range(n): sum2=sum2+ (results[i]-mean)**2 myvar1=sum2/n print "my variance1: ", myvar1 # calculate the mean of square minus square of mean sum3=0 for i in range(n): sum3=sum3+ results[i]**2 myvar2 = sum3/n - mean**2 print "my variance2: ", myvar2 

gives you:

 numpy variance: 28.8223642606 mean: -3.731599805 my variance1: 28.8223642606 my variance2: 28.8223642606 
+1
source share

Numpy has a method that does this for you, and this is the easiest way. Or you can write your own function.

 import numpy as np np.var(a) 

OR

 def find_variance(a): n = len(a) mean = sum(a)/n diff_sq = [None] * n for i in range(n): diff_sq[i] = (a[i] - mean) ** 2 return sum(diff_sq)/n 
0
source share

Using python, here are a few ways to do this:

 import statistics as st n = int(input()) data = list(map(int, input().split())) 

Approach 1 - using the function

 variance = st.pvariance(data) 

Approach 2: using basic math

 mean = sum(data)/n variance = sum([((x - mean) ** 2) for x in X]) / n print("{0:0.1f}".format(variance)) 

Remarks:

  • variance calculates population variance
  • pvariance calculates the variance of the entire population
0
source share

The correct answer is to use one of the packages, such as NumPy, but if you want to use your own package and do it gradually, there is a good algorithm that has higher accuracy. See this link https://www.johndcook.com/blog/standard_deviation/

I ported my Perl implementation to Python. Please indicate problems in the comments.

 Mklast = 0 Mk = 0 Sk = 0 k = 0 for xi in results: k = k +1 Mk = Mklast + (xi - Mklast) / k Sk = Sk + (xi - Mklast) * ( xi - Mk) Mklast = Mk var = Sk / (k -1) print var 

Answer

 >>> print var 32.0248491784 
0
source share
 import numpy as np def get_variance(xs): mean = np.mean(xs) summed = 0 for x in xs: summed += (x - mean)**2 return summed / (len(xs)) print(get_variance([1,2,3,4,5])) 

out of 2.0

 a = [1,2,3,4,5] variance = np.var(a, ddof=1) print(variance) 
0
source share

All Articles