How to determine the standard deviation (stddev) of a value set?

I need to know if a number is compared to a set of numbers outside 1 stddev from the middle, etc.

+45
math c # numerical statistics
May 22 '09 at
source share
12 answers

While the sum of squares algorithm works fine in most cases, it can cause big problems if you are dealing with very large numbers. Basically you can get a negative variance ...

Plus, never, never, never calculate a ^ 2 as pow (a, 2), a * a is almost certainly faster.

To date, the best way to calculate standard deviation is the Welford method . My C is very rusty, but it might look something like this:

public static double StandardDeviation(List<double> valueList) { double M = 0.0; double S = 0.0; int k = 1; foreach (double value in valueList) { double tmpM = M; M += (value - tmpM) / k; S += (value - tmpM) * (value - M); k++; } return Math.Sqrt(S / (k-2)); } 

If you have an entire population (as opposed to a sample population), use return Math.Sqrt(S / (k-1)); .

EDIT: I updated the code according to Jason's comments ...

EDIT: I also updated the code according to Alex's comments ...

+100
May 22 '09 at 11:46 a.m.
source share

10 times faster than Jaime, but know that, as Jaime noted:

"While the sum of squares algorithm works fine, it can cause big problems if you deal with very large numbers. You can basically end up with negative variance."

If you think that you are dealing with very large numbers or with a very large number of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use the β€œmy” method for you.

  public static double StandardDeviation(double[] data) { double stdDev = 0; double sumAll = 0; double sumAllQ = 0; //Sum of x and sum of xΒ² for (int i = 0; i < data.Length; i++) { double x = data[i]; sumAll += x; sumAllQ += x * x; } //Mean (not used here) //double mean = 0; //mean = sumAll / (double)data.Length; //Standard deviation stdDev = System.Math.Sqrt( (sumAllQ - (sumAll * sumAll) / data.Length) * (1.0d / (data.Length - 1)) ); return stdDev; } 
+7
Oct 30 '14 at 20:18
source share

Jaime's accepted answer is wonderful, except that you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Even better, run k at 0:

 public static double StandardDeviation(List<double> valueList) { double M = 0.0; double S = 0.0; int k = 0; foreach (double value in valueList) { k++; double tmpM = M; M += (value - tmpM) / k; S += (value - tmpM) * (value - M); } return Math.Sqrt(S / (k-1)); } 
+5
Dec 25 '13 at 12:44
source share

The Math.NET library provides this for you out of the box.

PM> Installation package MathNet.Numerics

 var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation(); var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation(); 

See PopulationStandardDeviation for more information.

+4
Feb 19 '16 at 0:42
source share

You can avoid two passes over the data by copying the middle and middle square

 cnt = 0 mean = 0 meansqr = 0 loop over array cnt++ mean += value meansqr += value*value mean /= cnt meansqr /= cnt 

and formation

 sigma = sqrt(meansqr - mean^2) 

Often the cnt/(cnt-1) factor is also suitable.

BTW-- The first pass on the data in Demi and McWafflestix answers is hidden in calls up to Average . These kinds of things are, of course, trivial in a small list, but if the list exceeds the size of the cache or even the working set, this becomes a deal with the offer.

+2
May 22 '09 at
source share

Code snippet:

 public static double StandardDeviation(List<double> valueList) { if (valueList.Count < 2) return 0.0; double sumOfSquares = 0.0; double average = valueList.Average(); //.NET 3.0 foreach (double value in valueList) { sumOfSquares += Math.Pow((value - average), 2); } return Math.Sqrt(sumOfSquares / (valueList.Count - 1)); } 
+2
May 22, '09 at 1:44
source share

I found that Rob's helpful answer did not quite match what I saw using excel. To match excel, I passed the Average value for valueList to the calculation of StandardDeviation.

Here are my two cents ... and it’s clear that you could calculate the moving average (ma) from the valueList inside the function - but I happen to already need a standard event.

 public double StandardDeviation(List<double> valueList, double ma) { double xMinusMovAvg = 0.0; double Sigma = 0.0; int k = valueList.Count; foreach (double value in valueList){ xMinusMovAvg = value - ma; Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg); } return Math.Sqrt(Sigma / (k - 1)); } 
+1
Jan 25 '12 at 1:12
source share

Using extension methods.

 using System; using System.Collections.Generic; namespace SampleApp { internal class Program { private static void Main() { List<double> data = new List<double> {1, 2, 3, 4, 5, 6}; double mean = data.Mean(); double variance = data.Variance(); double sd = data.StandardDeviation(); Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd); Console.WriteLine("Press any key to continue..."); Console.ReadKey(); } } public static class MyListExtensions { public static double Mean(this List<double> values) { return values.Count == 0 ? 0 : values.Mean(0, values.Count); } public static double Mean(this List<double> values, int start, int end) { double s = 0; for (int i = start; i < end; i++) { s += values[i]; } return s / (end - start); } public static double Variance(this List<double> values) { return values.Variance(values.Mean(), 0, values.Count); } public static double Variance(this List<double> values, double mean) { return values.Variance(mean, 0, values.Count); } public static double Variance(this List<double> values, double mean, int start, int end) { double variance = 0; for (int i = start; i < end; i++) { variance += Math.Pow((values[i] - mean), 2); } int n = end - start; if (start > 0) n -= 1; return variance / (n); } public static double StandardDeviation(this List<double> values) { return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count); } public static double StandardDeviation(this List<double> values, int start, int end) { double mean = values.Mean(start, end); double variance = values.Variance(mean, start, end); return Math.Sqrt(variance); } } } 
+1
Sep 05 '14 at 5:40
source share
 /// <summary> /// Calculates standard deviation, same as MATLAB std(X,0) function /// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/> /// </summary> /// <param name="values">enumumerable data</param> /// <returns>Standard deviation</returns> public static double GetStandardDeviation(this IEnumerable<double> values) { //validation if (values == null) throw new ArgumentNullException(); int lenght = values.Count(); //saves from devision by 0 if (lenght == 0 || lenght == 1) return 0; double sum = 0.0, sum2 = 0.0; for (int i = 0; i < lenght; i++) { double item = values.ElementAt(i); sum += item; sum2 += item * item; } return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1)); } 
0
Dec 09 '11 at 12:50
source share

The problem with all the other answers is that they assume that you have data in a large array. If your data arrives on the fly, this will be the best approach. This class works no matter how and when you store your data. It also gives you a choice of the Waldorf method or the sum of squares method. Both methods work using one pass.

 public final class StatMeasure { private StatMeasure() {} public interface Stats1D { /** Add a value to the population */ void addValue(double value); /** Get the mean of all the added values */ double getMean(); /** Get the standard deviation from a sample of the population. */ double getStDevSample(); /** Gets the standard deviation for the entire population. */ double getStDevPopulation(); } private static class WaldorfPopulation implements Stats1D { private double mean = 0.0; private double sSum = 0.0; private int count = 0; @Override public void addValue(double value) { double tmpMean = mean; double delta = value - tmpMean; mean += delta / ++count; sSum += delta * (value - mean); } @Override public double getMean() { return mean; } @Override public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); } @Override public double getStDevPopulation() { return Math.sqrt(sSum / (count)); } } private static class StandardPopulation implements Stats1D { private double sum = 0.0; private double sumOfSquares = 0.0; private int count = 0; @Override public void addValue(double value) { sum += value; sumOfSquares += value * value; count++; } @Override public double getMean() { return sum / count; } @Override public double getStDevSample() { return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1)); } @Override public double getStDevPopulation() { return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count); } } /** * Returns a way to measure a population of data using Waldorf method. * This method is better if your population or values are so large that * the sum of x-squared may overflow. It also probably faster if you * need to recalculate the mean and standard deviation continuously, * for example, if you are continually updating a graphic of the data as * it flows in. * * @return A Stats1D object that uses Waldorf method. */ public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); } /** * Return a way to measure the population of data using the sum-of-squares * method. This is probably faster than Waldorf method, but runs the * risk of data overflow. * * @return A Stats1D object that uses the sum-of-squares method */ public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); } } 
0
Apr 13 '16 at 6:57
source share

Perhaps we can use the statistics module in Python. It has the stedev () and pstdev () commands to calculate the standard deviation of the sample and population, respectively.

Details here: https://www.geeksforgeeks.org/python-statistics-stdev/

import statistics as st print (st.ptdev (dataframe ['column name']))

0
Oct 25 '18 at 13:12
source share

This is the standard deviation of the population.

 private double calculateStdDev(List<double> values) { double average = values.Average(); return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count); } 

For a standard deviation sample, simply change [values.Count] to [values.Count -1] in the code above.

Make sure your dataset does not have only one data point.

0
Feb 28 '19 at 13:08
source share



All Articles