Those who have taken an introductory course in probability or statistics will be familiar with two famous theorems: the law of large numbers and the central limit theorem. These theorems are both statements about what happens to the average of a sample of data as the amount of data goes to infinity. The law of large numbers says that, under mild conditions, the sample average will converge to the population-level average. The central limit theorem says that, under mild conditions, the sample average not only converges to the population average as data accumulates, but its distribution under repeated sampling converges to a normal distribution; we may then derive further nice properties which obtain as the amount of data we have goes to infinity.
These two theorems are examples of asymptotic results. Asymptotic results tell us about the behavior of (certain functions of) data as the amount of data goes to infinity. Asymptotic arguments form much of the backbone of statistical theory; they allow us to identify, for instance, which of a set of candidate estimators of some quantity will (asymptotically) be closest to that quantity, on average. In my field of sequential decision-making, asymptotic arguments about the rewards gained by decision-making algorithms in the limit of infinite data are also sometimes used to argue for using one algorithm over others.
And yet, we never have an infinite amount of data! What, then, leads us to trust asymptotic theory as a guide to the real world of finite sample sizes? As Geyer puts it:
We know that asymptotics often works well in practical problems because we can check the asymptotics by computer simulation (perhaps what Le Cam meant by “checked on the case at hand”), but conventional theory doesn’t tell us why asymptotics works when it does. It only tells us that asymptotics works for sufficiently large [sample size] n, perhaps astronomically larger than the actual n of the actual data. So that leaves a theoretical puzzle.
- Asymptotics often works.
- But it doesn’t work for the reasons given in proofs.
- It works for reasons too complicated for theory to handle.
I think the obvious but overlooked point that asymptotics logically implies nothing about finite data sets — and yet, empirically, does seem to track finite-sample properties — should lead us to adopt a more nuanced attitude towards asymptotic argumentation. I propose that we regard an asymptotic framework as a model of what happens in finite samples. A model is something that we use to hone in on (what we hope to be) the most important aspects of a phenomenon we’re interested in while throwing away (what we hope to be) details that have little effect on our final conclusions. What does this mean in the context of an asymptotic argument? Taking the example of the central limit theorem, the “important aspects” of the phenomenon are its approximately normal distribution in large samples, and the “unimportant details” are all the complicated mathematical terms that are present in finite samples but disappear as the sample size goes to infinity.
And as with models in all scientific fields, some are better than others at capturing the phenomenon of interest. In the case of asymptotics, certain asymptotic frameworks are better at capturing the finite-sample properties of estimators than others. For instance, standard asymptotic theory allows some estimators to be “superefficient” even though they are known to perform badly in finite samples. But adopting a different asymptotic framework — the “moving parameter” framework — leads to more realistic conclusions about the behavior of these estimators (see the discussion of Pollard).
Nevertheless, even these more sophisticated asymptotic frameworks are still models of the finite-sample phenomenon of interest. And there is no guarantee that the predictions of these models will hold true in any given situation. On this point of view, asymptotic theory should be taken down from its privileged position in academic statistics and set on par with other modes of statistical argumentation (such as simulations, correspondence with existing scientific theory, etc.) as a valuable but fallible source of qualitative insight.
Jesse is a PhD candidate working with Laber Labs. His research interests include reinforcement learning and artificial intelligence. Jesse is a returning author to our blog — check out his first post about exploration and exploitation. We thought this posting was a great excuse to get to know a little more about him, so we we asked him:
Q: Write a haiku about your favorite or most preferred player in the game Super Smash Bros Brawl.
Cool face of the turnip
Asked me for a kiss.
(With apologies to Langston Hughes.)