This post could also be subtitled “Residual deviance isn’t the whole story.”
My favorite book on logistic regression is by Dr Joseph Hilbe, Logistic Regression Models, CRC Press, 2009, Chapman & Hill. It is a solidly frequentist text, but its discussion of models and rich examples make that besides the point. Except in one case. In Section 15.2.1, Professor Hilbe writes:
When exact methods fail while using LogXact or SAS as a result of memory limitations, it is recommended that estimation be attempted using Monte Carlo sampling methods. There are two such methods employed by the software: the Direct Monte Carlo sampling method of Mehta et al. (2000) and the Markov Chain Monte Carlo (MCMC) sampling method, which is based upon the Gibbs sampling algorithm proposed by Forster et al. (1996). However, Mehta et al. (2000) demonstrated that the MCMC method can lead to incorrect results and recommend that it only be used when all other methods fail. We suggest that it not be used even in those circumstances, and that there are methods not based on Monte Carlo sampling that can be used to approximate exact statistical results. We discuss these methods in the two next sections.
Now, strictly speaking, Professor Hilbe is apparently referring to capabilities in LogXact or SAS or both, not to MCMC methods everywhere. I am not familiar with these features of LogXact or SAS, so I cannot speak to that. But this was published in 2009, and a good many methods for generalized linear models (“GLMs”) which do successfully use MCMC, even for binomial logistic regression. Some of them do not use Gibbs samplers and it seems that steering students of the problem away from these methods under any circumstances without this qualification is overdone.
There should also be a proper separation of concerns here. One the one hand there is binomial logistic regression, whose shoals present threats to general algorithms for GLMs whose non-linear optimizers, often based upon Newton methods, can founder. Separation is one such problem. It is common for authors of functions in packages intended to solve logistic regression to test them on such cases. The other is the viability of Markov Chain Monte Carlo (“MCMC”) methods, such as those criticized by Hilbe and Mehta, et al. I had a quick look at results based upon:
J. J. Forster, J.W. McDonald, P. W. F. Smith, “Monte Carlo exact conditional tests for log-linear and logistic models”, Journal of the Royal Statistical Society B, 1996, 58, 445-453 (*).
Specifically, I consulted:
D. Zamar, B. McNeny, J. Graham, “elrm: Software implementing exact-like inference for logistic regression models“, Journal of Statistical Software, October 2007, 21(3).
They report that the Forster et al. (1996) technique cited by Hilbe and Mehta, et al. (2000) is indeed a Gibbs sampler, but that
J. J. Forster, J. W. McDonald, P. W. F. Smith, “Markov Chain Monte Carlo exact inference for Binomial and Multinomial logistic regression models”, Statistics and Computing, 2003, 13, 169-177.
uses a general MCMC algorithm, not merely a Gibbs sampler, which Zamar, McNeny, and Graham implement and report upon.
While a comprehensive evaluation of these techniques is more than I presently have time, I decided to take the specific example Mehta, et al. (2000) used to claim the inferiority of the MCMC technique from their paper, namely, their small Table 1 dataset, reproduced below, and attempt to solve it using various packages, comparing results. That dataset is:
The methods and packages I tried were the BayesGLM function from the Gelman, Hill, Su, Yajima, Grazia, Kerman, Zheng, and Dorie arm package, the MCMClogit function from the MCMCpack package by Martin, Quinn, and Park, the glmrob function from the robustbase package by Maechler, and the elrm function from the elrm package by Zamar, Graham, and McNeny. Note both the MCMClogit function assumes a Bernoulli not a Binomial model for the data, and, so, for these cases the above table needs to be expanded to Binomial instances. This can be done using an R snippet,
Table1.expanded<- matrix(0, nrow=0, ncol=4)
for (k in (1:nrow(Table1)))
if (0 < Table1[k,2])
for (j in (1:Table1[k,2]))
Table1.expanded<- rbind(Table1.expanded, c(Table1[k,1], 1, Table1[k,4], Table1[k,5]))
for (j in ((1+Table1[k,2]):Table1[k,3]))
Table1.expanded<- rbind(Table1.expanded, c(Table1[k,1], 0, Table1[k,4], Table1[k,5]))
colnames(Table1.expanded)<- c("group.id", "response", "var.1", "var.2")
BernoulliTable1.frame<- as.data.frame(Table1.expanded, stringsAsFactors=FALSE)
This produces the revised data table for these functions:
First of all, the “robust” glmrob was not. Using a method of “Mqle”, it consistently reported:
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Error in solve.default(crossprod(X, DiagB * X)/nobs, EEq) :
system is computationally singular: reciprocal condition number = 5.84949e-18
I could not get the method “BY” to work because when I tried to pass success counts and fail counts in a response variable, as recommended for glm and the Binomial family, glmrob did not know what to do with this. According to the documentation I read, this was needed. According to R‘s family function description:
For the binomial and quasibinomial families the response can be specified in one of three ways:
(1) As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
(2) As a numerical vector with values between 0 and 1, interpreted as the proportion of successful cases (with the total number of cases given by the weights).
(3) As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
Yet glmrob complains when “(3)” is used.
Second, the straight-out-of-the-box built-in glm works, but does a bad job of it:
M1<- glm (response ~ var.1 + var.2, family=binomial(link="logit"), data=Table1.frame, weights=Table1.weights)
Classical GLM Logistic Regression
glm(formula = response ~ var.1 + var.2, family = binomial(link = "logit"),
data = Table1.frame, weights = Table1.weights)
n = 6, k = 3
residual deviance = 0.0, null deviance = 15.5 (difference = 15.5)
Residuals: 1 2 3 4 5 6
1.0000006533e+00 -1.0000000001e+00 -1.0000000000e+00 -1.0000000000e+00 -1.0000000000e+00 -1.1368683772e-13
Just look at those standard errors! Yet the residual deviance is zero, which looks more to me like a failure of calculation than anything accurate.
arm Package BayesGLM
Now the BayesGLM produced the following estimates:
|Cauchy prior scale||DF value||residual deviance obtained||var.1 estimate||var.1 s.e.||var.2 estimate||var.2 s.e.|
Those standard errors look pretty good, excepting the first and last rows, where the deterioration is expected.
MCMCpack package MCMClogit
This function took a lot of work in order to produce anything reasonable with this problem. Worse, even at 1.5 million MCMC steps and a burn-in of 250,000 the standard errors are poor. Ironically, the deviance is better than the arm results, suggesting to me that the problem with these large coefficient results is overfitting. However, I have not inspected the algorithm MCMClogit uses. I am a little surprised, since it is possible to tune this interface directly for a logistic regression, rather than use a general-purpose interface for GLMs as the other packages do.
A user-defined Cauchy prior was used for this routine, namely,
logpriorfun <- function(beta, location, scale)
sum(dcauchy(beta, location, scale, log=TRUE))
with the call
posterior<- MCMClogit(response ~ var.1 + var.2, burnin=250000, mcmc=1500000, thin=10, tune=0.0001, verbose=0, seed=wonkyRandom,
beta.start=rnorm(3, 0, 0.1), data=BernoulliTable1.frame, user.prior.density=logpriorfun,
logfun=TRUE, location=0, scale=2.5, marginal.likelihood="Laplace")
Remember this routine demands a Bernoulli model. The runs look okay:
|Cauchy prior scale||Tuning Parameter||Metropolis acceptance rate||residual deviance obtained||var.1 estimate||var.1 s.e.||var.2 estimate||var.2 s.e.|
The standard errors have, of course, been corrected for MCMC effective size, which are
Thus, in this case, the bias-finding for MCMC logstic regression algorithms appears to be correct. The acceptance rate is a little high, and there were settings of the turning parameter which knocked acceptance lower, but it is very sensitive and does not produce consistent results, at least not with the random initialization for values which is the way it should really be done.
elrm package elrm
This is the 2003 re-implementation of MCMC for logistic regression developed by Forster, McDonald, and Smith, as implemented by
Zamar, McNeny, and Graham. It’s an odd duck because, despite using a constructed Markov Chain, it reports p-values rather than, say, credible intervals. In doing so, however, it offers a specific comparison with the complaints of Hilbe and Mehta, et al., who observed that their runs of the earlier Forster, McDonald, and Smith Gibbs sampler always produced p-values very nearly unity. No deviances are produced by this routine, and the coefficients returned do not include the Intercept, so I could not calculate residuals myself.
|#burn-in steps||#MCMC steps||joint p-value||var.1 estimate||var.1 p-value||var.1 p-value s.e.||var.2 estimate||var.2 p-value||var.2 p-value s.e.|
Odd duck indeed. While the estimates fall more of less in line with results of BayesGLM from arm, the joint p-value decreases, separated by almost 7 times the pooled standard error of 0.00049.
|#MCMC steps||joint p-value s.e.||effective number of steps for joint p-value|
The estimate for the first variable is higher than than produced by BayesGLM. The standard errors from the elrm chains for the two variables, corrected for effective size are 0.05 and 0.017. The pooled standard error for that first variable is thus about 0.15 and, so, the two estimates for the first variable are separated by under two pooled standard errors, which is consistent if not plausible.
Code, Trace, and Consensus
The R code to reproduce these results along with a trace of the execution are available in a tarball. Note that the call to the glmrob is conditionally executed, and the default is off, because it otherwise stops the script.
My consensus is that I’d trust arm‘s BayesGLM for general work, and perhaps run elrm as a cross-check.
(*) I did not consult the original paper because I have no access to the Journal of the Royal Statistical Society.