From Variance : “the variance of the general group is equal to the average of the variances of the subgroups plus the variance of the means of the subgroups. I had to read this several times and then run it: 464 from this formula == 464, the standard deviation of all the data is the only number you want.
#!/usr/bin/env python import sys import numpy as np N = 10 exec "\n".join( sys.argv[1:] ) # this.py N= ... np.set_printoptions( 1, threshold=100, suppress=True ) # .1f np.random.seed(1) data = np.random.exponential( size=( N, 60 )) ** 5 # N rows, 60 cols row_avs = np.mean( data, axis=-1 ) # av of each row row_devs = np.std( data, axis=-1 ) # spread, stddev, of each row about its av print "row averages:", row_avs print "row spreads:", row_devs print "average row spread: %.3g" % np.mean( row_devs ) # http:
row averages: [ 49.6 151.4 58.1 35.7 59.7 48. 115.6 69.4 148.1 25. ] row devs: [ 244.7 932.1 251.5 76.9 201.1 280. 513.7 295.9 798.9 159.3] average row dev: 375 sqrt total variance: 464 = sqrt( av var 2.13e+05 + var avs 1.88e+03 ) sqrt variance all: 464
To see how group variance increases, run the example on Wikipedia. Let's say
60 men of heights 180 +- 10, exactly 30: 170 and 30: 190 60 women of heights 160 +- 7, 30: 153 and 30: 167.
The average standard dev is (10 + 7) / 2 = 8.5. However, heights
-------|||----------|||-|||-----------------|||--- 153 167 170 190
spreads like 170 + - 13.2, much more than 170 + - 8.5.
What for? Because we have not only spreads of men + - 10, and women + - 7, but also spreads from 160/180 about the average value of 170.
Exercise: calculate the spread of 13.2 in two ways, from the formula above and directly.