## Calculating the best standard deviation estimate of a normally distributed process, for small N

January 25, 2006

Calculating the best standard deviation estimate for small N can be tricky. First of all, calculating the sample variance s2 is reasonably straightforward

But wait, have you seen this formula with n replaced by n- 1? Ever wonder why that is? To get the so-called ”unbiased” estimate for the variance, you must use the relation

to estimate the original variance 2

This is where the infamous n- 1 comes from. This is all well and good, but what about estimating the standard deviation? A sample standard deviation s can be defined by

Because of the non-linearity of the square root, the mean of this distribution is not simple. Unfortunately, is also dependant of the distribution of the original process. For normally distributed random variables, it has a mean of

where is the actual standard deviation of the original process. To recover the best estimate of for small n, we need to multiply our standard deviation by a compensation factor of

In general, C(n) is not easy to calculate in a general purpose computer language, unless you have access to the Gamma function. Even if you do, the Gamma function will overflow for moderately sized n, even while C(n) is bounded.

I found a really simple way to generate a table of these values in an iterative manner. The trick was to make use of the identity

and calculate the relation

and the ’s are gone! We simply need to start with the initial value

And built a table of C(n) based on the value of C(n - 1), for some reasonable range of 2 < n < nmax.

In my implementation, 32767 was plenty. The effect is only important for small n as limnC(n) = 1

Now, when we need to calculate a standard deviation based on n samples, we can use

And we will have the best possible estimate, if the original random variable is normal.