While waiting for the T today, I began to ponder the following abstract problem. Many calculations in geophysics these days take advantage of generating so-called ensembles of results, each initialized at a slightly difference place, in order to describe the range of possibilities for simulations or solutions of truly complicated systems, often physical. The term is also used in conjunction with projections of, say, weather models, all initialized with the same data, producing such ensembles of outcomes. And the term is also used with respect to historical collections, such as the ensemble of tracks of hurricanes in the Atlantic, especially those which are land-falling.
So, to the degree these simulations themselves are Monte Carlo elections from a parent population, I wondered the following: If the ensembles are run and stored, and then students of whatever subject or phenomenon randomly select from among the stored ensembles, in the manner of a frequentist bootstrap, does the resulting distribution of variability still capture the original parent population’s statistics? My intuition says “No”, that the resulting sampling results in a distribution which while it may retain the parent’s central tendency, it has a smaller variance. However, the interesting puzzle is (a) to demonstrate that, and (b) to develop the theoretical connection as to why it is true, if it is true.
I also wonder if the Bayesian bootstrap might not have one up on the frequentist one here, since with many priors, the set of possibilities which are elected is wider than merely that which has been observed in the single dataset in hand, and so its variability could at least in principle be bigger than that of a single realization.
Finally, in both the case of the frequentist bootstrap and the Bayesian one, is it possible to formally relate the variance of the sampling of the samplings to the original, so some kind of correction could be applied?