(Updated 2016-05-08, to provide reference for plateaus of ML functions in vicinity of MLE.)
Simpson’s Paradox is one of those phenomena of data which really give Statistics a substance and a role, beyond the roles it inherits from, say, theoretical probability and computatational methods. A similar reason for Statistics lurks behind maximum likelihood methods, the heart of frequentist statistical practice. There, practical studies show that in many situations, the likelihood function often plateaus in the vicinity of its maximum, even assuming there is a unique maximum (*). (Complicated systems might have several local maxima.) This means that measurement errors can nudge the estimated maximum relatively far away in terms of the corresponding maximum likelihood parameters, and one practical role for Bayesian statistical methods is to tame this instability by weighting with a prior.
But returning to Simpson’s Paradox, Cory Lesmeister in his blog, Fear and Loathing in Data Science does a very nice bit on Simpson’s, including a link to a good explanatory tutorial. I’ve included his blog at the blogroll to the right.
(*) S. Konishi, G. Kitagawa, Information Criteria and Statistical Modeling, Springer, 2008. This is one of the half dozen or so most books I’ve studied in the last 10 years which has had the greatest influence on my statistical thought and practice. While I am a staunch Bayesian, my approach to the problem of inference comes at it from different perspectives, viewing inference as a computational optimization problem, and seeing an information theoretic approach as a more engineering-oriented way of addressing some of the issues and arguments which arise in other contexts which, to me, seem more philosophical. The other text in this same spirit is one I’ve often cited, namely, K. P. Burnham, D. R. Anderson, Model Selection and Multimodel Inference: A Practical Information Theoretic Approach, 2nd edition, Springer, 2002. In fact, I am thinking of offering an online course based upon these two texts, and weaving in modern stochastic methods of computation, like Markov Chain Monte Carlo (MCMC).