Stirling Approximation
It’s about time that we go back to the old themes again. When I first started this blog, I briefly dabbled in real analysis via Euler, with a particular focus on factorials, interpolation, and the Beta function. I decided to go a bit retro and revisit these motifs in today’s post, by introducing Stirling’s approximation of the factorial.
There are many variants of Stirling’s approximation, but here we introduce the general form as shown:
\[\lambda! \approx e^{- \lambda} \lambda^\lambda \sqrt{2 \pi \lambda} \tag{1}\]Let’s begin the derivation by first recalling the Poisson distribution. The Poisson distribution is used to model the probability that a certain event occurs a specified number of times within a defined time interval given the rate at which these events occur. The formula looks as follows:
\[P(X = k) = \frac{\lambda^k e^{- \lambda}}{k!} \tag{2}\]One interesting fact about the Poisson distribution is that, when the parameter $\lambda$ is sufficiently large, the Poisson approximates the Gaussian distribution whose mean and variance are both $\lambda$. This happens when the random variable $X = \lambda$.
\[\begin{align}P(X = \lambda) &= \frac{\lambda^\lambda e^{- \lambda}}{\lambda!} \\ &\approx \frac{1}{\sqrt{2 \pi \lambda}} \exp \left(\frac{- (\lambda - \lambda)^2}{2 \lambda^2} \right)\end{align} \tag{3}\]We can easily simplify (2) since the power of the exponent is zero. Thus, we have
\[\frac{\lambda^\lambda e^{- \lambda}}{\lambda!} \approx \frac{1}{\sqrt{2 \pi \lambda}} \tag{4}\]By simply rearranging (3), we arrive at Stirling’s approximation of the factorial:
\[\lambda! \approx e^{- \lambda} \lambda^\lambda \sqrt{2 \pi \lambda} \tag{1}\]This is cool, but we still haven’t really shown why a Poisson can be used to approximate a Gaussian—after all, this premise was the bulk of this demonstration.
To see the intuition behind this approximation, it is constructive to consider what happens when we add independent Poisson random variables. Say we have $X_1$ and $X_2$, both of which are independent Poisson random variables with mean $\lambda_1$ and $\lambda_2$. Then, $X_1 + X_2$ will be a new Poisson random variable with mean $\lambda_1 + \lambda_2$. If we extend this idea to apply to $n$ independent random variables instead of just two, we can conclude that $n$ collection of independent random variables from $X_1$ to $X_n$ sampled from a population of mean $\frac{\lambda}{n}$ will have mean $\lambda$. And by the nature of the Poisson distribution, the same goes for variance (We will elaborate on this part more below). The Central Limit Theorem then tells us that the distribution of the sum of these random variables will approximate a normal distribution. This concludes a rough proof of the Stirling approximation.
For those of you who are feeling rusty on the Poisson distribution as I was, here is a simple explanation on the Poisson—specifically, its mean and variance. By the virtue of the definition of the parameter, it should be fairly clear why $\mathbb{E}[X] = \lambda$: $\lambda$ is a rate parameter that indicates how many events occur within a window of unit time. The expected calculation can easily be shown using Taylor expansion:
\[\begin{align} \mathbb{E}[X] &= \sum_{k=0}^\infty k \cdot \frac{\lambda^k e^{-\lambda}}{k!} \\ &= e^{-\lambda} \sum_{k=1}^\infty \frac{\lambda^k }{(k-1)!} \\ &= \lambda e^{- \lambda} \sum_{k=1}^\infty \frac{\lambda^{k-1} }{(k-1)!} \\ &= \lambda e^{- \lambda} e^\lambda \\ &= \lambda \end{align} \tag{5}\]Next, we prove that the variance of a Poisson random variable defined by parameter $\lambda$ is equal to $\lambda$. Let $X$ be a Poisson random variable. Then,
\[\begin{align} \mathbb{E}[X^2] &= \sum_{k=0}^\infty k^2 \cdot \frac{\lambda^k e^{-\lambda}}{k!} \\ &= \lambda e^{-\lambda} \sum_{k=1}^\infty k \cdot \frac{\lambda^{k-1}}{(k-1)!} \\ &= \lambda e^{-\lambda} \left[ \sum_{k=1}^\infty (k - 1) \frac{\lambda^{k-1}}{(k-1)!} + \sum_{k=1}^\infty \frac{\lambda^{k-1}}{(k-1)!} \right] \\ &= \lambda e^{-\lambda} (\lambda e^\lambda + e^\lambda) \\ &= \lambda^2 + \lambda \end{align} \tag{6}\]Then, using the definition of variance, we know that
\[\mathbb{E}[X^2] - \mathbb{E}[X]^2 = \lambda^2 + \lambda - \lambda^2 = \lambda\]From this, we are once again reminded of the defining property of the Poisson, which is that both the mean and variance of a Poisson random variable is defined by the parameter $\lambda$.
Let’s tie this back to our original discussion of the Central Limit Theorem. CLT states that, even if a distribution of a random variable is not normal, the distribution of the sums of these random variables will approximate a normal distribution. Using this handy property on the $n$ independent and identically distributed Poisson random variables of mean and variance $\frac{\lambda}{n}$, we can see how the sum of these random variables approximates a Gaussian distribution $\mathcal{N}(\lambda, \lambda)$.
Hence the Stirling approximation! \(\lambda! \approx e^{- \lambda} \lambda^\lambda \sqrt{2 \pi \lambda} \tag{1}\)