Your American History Reference Guide!
- Algorithms for calculating variance

HistoryMania Information Site on Algorithms for calculating variance American History American History Search        American History Browse welcome to our free resource site for all enthusiasts!

Algorithms for calculating variance

In statistics, a formula for calculating the variance of a population of size n is:

\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n^2}.

A formula for calculating the unbiased estimation of the population variance from n finite samples is:

\mathrm{Variance} = \frac {n\sum_{i=1}^{n} x_i^2 - (\sum_{i=1}^{n} x_i)^2}{n(n-1)}.

The method of calculation may be more easily understood from the table below where the mean is 8.

i xi xi − mean (xi − mean)2
(index) (datum) (deviation) (squared deviation)
1 5 −3 9
2 7 −1 1
3 8 0 0
4 10 2 4
5 10 2 4
n = 5 sum = 40 0 18
  • mean = 40/5 = 8
  • variance = (5 · 338 − 402)/(5 · 4) = 4.5
  • standard deviation = \sqrt{\mathrm{Variance}} = 2.12

Note: Details of the variance calculation:

338 = [52 + 72 + 82 + 102 + 102]
40 = [5 + 7 + 8 + 10 + 10]

Algorithm

Therefore a simple algorithm to calculate variance can be described by the following pseudocode:

long n = 0;
double sum = 0;
double sum_sqr = 0;
double variance;

foreach x in data:
  n += 1;
  sum += x;
  sum_sqr += x*x;
end for

variance = (sum_sqr - sum*sum/n)/(n-1);

Algorithm

Another algorithm which avoids large numbers in sum_sqr while summing up

double avg = 0;
double var = 0;
long n = data.length; // number of elements

for i = 1 to n
 avg = (avg*i + data[i]) / (i + 1);
 var = (var * (i - 1) + (data[i] - avg)*(data[i] - avg)) / i;
end for

return var; // resulting variance
The contents of this article are licensed from Wikipedia.org under the
GNU Free Documentation License. How to see transparent copy
Search | Browse | Contact | Legal info