Bayesian Bootstrap

I’m studying the Bayesian bootstrap in the context of finite population sampling for an application where I need to estimate multinomial proportions. While I have used the frequentist bootstrap a lot, it has bothered me that it can never, of course, create an observation value not recorded in a dataset, even if there is good reason to believe such a value is possible, if rare. The Bayesian bootstrap has been discussed in many places since its proposal by Rubin in 1981. I have found the following five references helpful:

• M. Aitkin, “Applications of the Bayesian bootstrap in finite population inference”, Journal of Official Statistics, 24(1), 2008, 21–51.
• G. Meeden, “Noninformative nonparametric Bayesian estimation of quantiles”, Statistics & Probability Letters, 16, 1993, 103-109.
• G. Meeden, “Simulating from the Polya posterior”, March 2006, a vignette from the R CRAN polyapost package.
• M. Ghosh, G. Meeden, Bayesian Methods for Finite Population Sampling, Chapman & Hall/CRC, 1998.
• A. C. Davison, D. V. Hinkley, Bootstrap Methods and their Appliation, Cambridge University Press, 2009, and especially their Section 10.5, beginning on page 512.
• T. Lancaster, “A note on bootstraps and robustness”, Brown University Department of Economics Working Paper No. 2006-06, 2003, http://dx.doi.org/10.2139/ssrn.896764

I wrote some code to do this bootstrap using the procedure in Davison and Hinkley as informed by Meeden’s work, applied to an example from Davison and Hinkley, namely, ten rows from their Example 2.8, as described in their Example 10.6. I got the figure below for $R \in \{1000, 10000, 100000\}$ replications.

Bogumił Kamiński at his R Snippets blog has written of possible errors in implementing the Bayesian bootstrap in various places. While I don’t want to comment on specifics there, because I have not followed the development closely, I will say both in the case of the frequentist and Bayesian bootstrap, there are many ways to work detailed. I mean, even in the frequentist case there are ordinary and balanced flavors of the thing, and these can give markedly different results. In the Bayesian case, because of the close relationship with a Multinomial model, what’s gotten depends upon the support of the prior. So, in the Davison and Hinkley example, the $n = 10$ cities are a sample from a larger population (at least $n = 49$) and ratios to be studies are all 100 pairs, one from 1920 and one from 1930. If none of the 39 remaining rows were known, but a student wanted to examine what contribution these might have in the eventual posterior, they could interpolate some values among the $n = 10$ rows and put them in the support and prior with minimal mass. Since these are not categorical points, as Meeden and others discuss, there is some rationale to smearing probability mass in the prior from known points onto unknown, in proportion to the unknown points distance away from the known ones, in the manner that (frequentist) BLUE estimation (or frequentist “kriging”) does for unsampled field points in spatial sampling.

Postscript: 2013-10-19 23:29 ET. Quoting from a Comment I placed at Another Astrostatistics Blog,

Successful week here, too: I am given a tabulation of counts for M categories, one tabulation per hour, the counts all coming from the same rich source. I am assuming the categories are independent (at least at first, intending to deal with depending using hyperpriors later). I use a Bayesian bootstrap to estimate the probabilities of choosing each of the Categories, right now taking a mean of the bootstrap replica probabilities and then renormalizing, even if that is barely necessary. I then repeat the process for the next hour.

I have done this from a huge, comparable parent population similar to but not including the specific subpopulations and wrangled that into a kind of empirical prior. I then use that prior in conjunction with the recipe above. I have also learned, from the parent, an ordering of the Categories from most common to least, and use that in presentations.

The pertinent image is shown below.

Finally, the code for the Bayesian bootstrap I wrote is given below. I used the boot function from the CRAN boot package for the “Efron” counterpart, with its “sim” parameter being set to “ordinary”.

 # Bayesian bootstrap, single variable. bayesianOneBootFrom<- function(Y, U, prior, statistic, R=1000, nb=1000, cvtype="biased", show=TRUE, main="", xlab="") { # Implemented from Section 10.5 of Davison, Hinkley, # BOOTSTRAP METHODS AND THEIR APPLICATION, Cambridge University Press, 2009. # # U is the universe of possible values for Y, the data vector. stopifnot( is.vector(U) ) M<- length(U) stopifnot( length(unique(U)) == M ) # The prior counts the number of times U[j] appears in the prior. # It is not a function. stopifnot( is.vector(prior) ) stopifnot( M == length(prior) ) stopifnot( is.character(cvtype) ) # Kind of cross-validation used in the density estimate at end. stopifnot( (cvtype == "biased") || (cvtype == "unbiased") ) # Y is the data vector to be bootstrapped. N<- length(Y) uniqueY<- unique(Y) stopifnot( all(0 < match(uniqueY, U, nomatch=0)) ) # F counts the number of times U[j] appears in the data vector Y. F<- as.vector(sapply(X=U, FUN=function(u) length(which(u == Y)))) # statistic is a one-parameter function statistic=function(y1) stopifnot( is.function(statistic) ) theta<- matrix(NA, nrow=R, ncol=length(statistic(x=c(0), w=c(1)))) mix<- F+prior+1 for (r in (1:R)) { G<- rgamma(n=M, shape=mix, scale=1) P<- G/sum(G) result<- statistic(x=U, w=P) stopifnot( is.vector(result) ) theta[r,]<- result } if (cvtype == "unbiased") { bw<- bw.ucv(x=theta, nb=nb) } else { # biase bw<- bw.bcv(x=theta, nb=nb) } posteriorEstimate<- density(x=theta, bw=bw, adjust=1, kernel="epanechnikov") posteriorEstimate$y<- posteriorEstimate$y/sum(posteriorEstimate$y) if (show) { plot(posteriorEstimate, main=main, xlab=xlab) } return(list(R=R, t=theta, statistic=mean(theta), unique.Y=uniqueY, smoothed.posterior.x=posteriorEstimate$x, smoothed.posterior.y=posteriorEstimate$y)) }  # First ten rows from Table 1.3 of Davison and Hinkley: Populations in thousands of n=49 large US cities in 1920 and in 1930. Table.1.3<- matrix(c(138, 143, 93, 104, 61, 69, 179, 260, 48, 75, 37, 63, 29, 50, 23, 48, 30, 111, 2, 50), nrow=10, ncol=2, byrow=TRUE) colnames(Table.1.3)<- c("1920", "1930") rownames(Table.1.3)<- sapply(X=1:10, FUN=function(n) sprintf("n=%-.0f", n)) Table.1.3.unique.1920<- unique(Table.1.3[,1]) Table.1.3.unique.1930<- unique(Table.1.3[,2]) # Code to reproduce figure:  if (TRUE) { R<- 1000 print(Table.1.3) Ratios<- as.vector(sapply(X=Table.1.3.unique.1920, FUN=function(u1) sapply(X=Table.1.3.unique.1930, FUN=function(u2) u2/u1))) M<- length(unique(Ratios)) print(sort(unique(Ratios))) print(Table.1.3) OneBoot.1<- bayesianOneBootFrom(Y=Ratios, U=unique(Ratios), prior=rep(x=2.3, times=M), statistic=function(x, w) sum(x*w), R=R, nb=1000, cvtype="biased", show=TRUE, main="1930 population versus 1920 for 10 U.S. cities\nprior is -1", xlab=expression(theta)) OneBootSimple.1<- boot(data=Ratios, statistic=function(data, indices) mean(data[indices]), R=R, sim="ordinary", parallel="multicore", ncpus=4) posteriorEstimate<- density(x=OneBootSimple.1$t, bw=bw.bcv(x=OneBootSimple.1$t, nb=1000), adjust=1, kernel="epanechnikov") posteriorEstimate$y<- posteriorEstimate$y/sum(posteriorEstimate$y) png(filename="ComparisonOfEfronFrequentistWithRubin-AitkinBayesianBootstrap.png", width=14, height=8.5, units="cm", pointsize=4, res=800) plot(x=OneBoot.1$smoothed.posterior.x, y=OneBoot.1$smoothed.posterior.y, type="l", col="blue", lwd=2, xlab=expression(paste(theta, ", ratio of 1930 to 1920 population")), ylab="", main="Comparison of Bootstrap Precisions using\nFirst ten rows from Table 1.3 of Davison and Hinkley: Populations in thousands of n=49 large US cities in 1920 and in 1930.", ylim=c(0, 0.01)) lines(x=posteriorEstimate$x, y=posteriorEstimate$y, col="red", lwd=2) R<- 10000 OneBoot.2<- bayesianOneBootFrom(Y=Ratios, U=unique(Ratios), prior=rep(x=2.3, times=M), statistic=function(x, w) sum(x*w), R=R, nb=1000, cvtype="biased", show=FALSE, main="1930 population versus 1920 for 10 U.S. cities\nprior is -1", xlab=expression(theta)) OneBootSimple.2<- boot(data=Ratios, statistic=function(data, indices) mean(data[indices]), R=R, sim="ordinary", parallel="multicore", ncpus=4) posteriorEstimate<- density(x=OneBootSimple.2$t, bw=bw.bcv(x=OneBootSimple.2$t, nb=1000), adjust=1, kernel="epanechnikov") posteriorEstimate$y<- posteriorEstimate$y/sum(posteriorEstimate$y) lines(x=OneBoot.2$smoothed.posterior.x, y=OneBoot.2$smoothed.posterior.y, col="blue", lty=6, lwd=1) lines(x=posteriorEstimate$x, y=posteriorEstimate$y, col="red", lty=6, lwd=1) R<- 100000 OneBoot.3<- bayesianOneBootFrom(Y=Ratios, U=unique(Ratios), prior=rep(x=2.3, times=M), statistic=function(x, w) sum(x*w), R=R, nb=1000, cvtype="biased", show=FALSE, main="1930 population versus 1920 for 10 U.S. cities\nprior is -1", xlab=expression(theta)) OneBootSimple.3<- boot(data=Ratios, statistic=function(data, indices) mean(data[indices]), R=R, sim="ordinary", parallel="multicore", ncpus=4) posteriorEstimate<- density(x=OneBootSimple.3$t, bw=bw.bcv(x=OneBootSimple.3$t, nb=1000), adjust=1, kernel="epanechnikov") posteriorEstimate$y<- posteriorEstimate$y/sum(posteriorEstimate$y) lines(x=OneBoot.3$smoothed.posterior.x, y=OneBoot.3$smoothed.posterior.y, col="blue", lty=3, lwd=1) lines(x=posteriorEstimate$x, y=posteriorEstimate$y, col="red", lty=3, lwd=1) legend("topright", c("Efron bootstrap, R = 1000", "Rubin and Aitkin bootstrap, R = 1000", "Efron bootstrap, R = 10,000", "Rubin and Aitkin bootstrap, R = 10,000", "Efron bootstrap, R = 100,000", "Rubin and Aitkin bootstrap, R = 100,000"), col=c("red", "blue", "red", "blue", "red", "blue"), lwd=c(2, 2, 1, 1, 1, 1), lty=c(1, 1, 6, 6, 3, 3), cex=1.3) legend("topleft", c("Bayesian bootstrap used prior of 2.3 on all ratios"), text.col=c("blue"), cex=1.3) dev.off() }