# Solved – Online estimation of variance with limited memory

I am creating a component that aims to calculate the average and variance of a metric associated with events happening during time but with a limited internal memory.

Imagine that the events are visitors entering in a shop and the metric is their age.

During time, my component receives events with the age of each visitor.
I don't want my component to memorize the history of each ages.
Ideally, I would like a light component storing only:
the average `A`, the variance V and the number of events `N`.

After each event with age `E`, I want to update those three values :

``N<=N+1 A<=(A*N+E)/(N+1) V<=??? ``

What for `V`? I am thinking of something like :

``V<=(V*N+(E-A)^2)/(N+1) ``

I know it is not exact as my previous `V` is using the old `A` which is no more the average.

Q1 – Is there an exact formula?
Q2 – If not, is my proposal a good estimate? Is it biased? Will it converge correctly when `N` increases?
Q3 – Is there a better formula?

Contents

``List welford_cpp(NumericVector x) {    int n = x.length();   double delta;   double msq = 0;   double mean = x;    if (n > 1) {     for (int i = 1; i < n; i++) {        delta = x[i] - mean;       mean += delta / (i+1);       msq += delta * (x[i] - mean);     }     return Rcpp::List::create(Rcpp::Named("mean") = mean,                               Rcpp::Named("variance") = msq / (n-1));   }    return Rcpp::List::create(Rcpp::Named("mean") = mean,                             Rcpp::Named("variance") = NAN); } ``
As you can see, it needs to store only four variables: `n`, `delta`, `msq` and `mean` and computes mean and variance simultaneously as you wanted.