## Calculating the best standard deviation estimate
of a normally distributed process, for small N

Frank Palazzolo

January 25, 2006

Calculating the best standard deviation estimate for small N can be tricky. First
of all, calculating the sample variance s^{2} is reasonably straightforward

But wait, have you seen this formula with n replaced by n- 1? Ever wonder why
that is? To get the so-called ”unbiased” estimate for the variance, you must use the
relation

to estimate the original variance ^{2}

This is where the infamous n- 1 comes from. This is all well and good, but what
about estimating the standard deviation? A sample standard deviation s can be
defined by

Because of the non-linearity of the square root, the mean of this distribution
is not simple. Unfortunately, is also dependant of the distribution of the
original process. For normally distributed random variables, it has a mean
of

where is the actual standard deviation of the original process. To recover the
best estimate of for small n, we need to multiply our standard deviation by a
compensation factor of

In general, C(n) is not easy to calculate in a general purpose computer
language, unless you have access to the Gamma function. Even if you do, the
Gamma function will overflow for moderately sized n, even while C(n) is
bounded.

I found a really simple way to generate a table of these values in an iterative
manner. The trick was to make use of the identity

and calculate the relation

and the ’s are gone! We simply need to start with the initial value

And built a table of C(n) based on the value of C(n - 1), for some reasonable
range of 2 < n < n_{max}.

In my implementation, 32767 was plenty. The effect is only important for small n
as lim_{n}C(n) = 1

Now, when we need to calculate a standard deviation based on n samples, we can
use

And we will have the best possible estimate, if the original random variable is
normal.