Less evidence for a global warming hiatus, and urging more use of Bayesian model averaging in climate science

(This post has been significantly updated midday 15th February 2018.)

I’ve written about the supposed global warming hiatus of 2001-2014 before:

The current issue of the joint publication Significance from the Royal Statistical Society and the American Statistical Association has a nice paper by Professor Niamh Cahill of University College, Dublin. Professor Cahill is a colleague of Professor Stefan Rahmstorf, Dr Grant Foster (“Tamino”), and Professor Andrew Parnell. (Parnell is also from University College.) I’ll list a related history of their papers in a moment.

It’s good to see climate science and data treated well by statisticians, even though many geophysicists, oceanographers, and atmospheric scientists know something about statistics and data analysis (*). There is this paper by Professor Cahill, and the November 2017 issue of CHANCE was devoted to the subject. This is great, because the relationship between professional statisticians and climate scientists has been rocky at times. Notice some of the comments here and this rant. There is also some distrust of statistical methods from the geophysics side or, at least, from some atmospheric scientists. The great Jule Charney reportedly dismissed an analysis once by dubbing it “just curve fitting”, since the standard in his field was ab initio physics. And squarely within the margins of the present discussion, there is this gentle admonition from Drs Fyfe, Meehl, England, Mann, Santer, Flato, Hawkins, Gillett, Xie, Kosaka, and Swart that

The warming slowdown as a statistically robust phenomenon has also been questioned. Recent studies have assessed whether or not trends during the slowdown are statistically different from trends over some earlier period. These investigations have led to statements such as “further evidence against the notion of a recent warming hiatus” [Karl, T. R. et al. Science 348, 1469–1472 (2015)] or “claims of a hiatus in global warming lack sound scientific basis” [Rajaratnam, B., Romano, J., Tsiang, M. & Diffenbaugh, N. S. Climatic Change 133, 129–140 (2015)]. While these analyses are statistically sound, they benchmark the recent slowdown against a baseline period that includes times with a lower rate of increase in greenhouse forcing [Flato, G. et al. in Climate Change 2013: The Physical Science Basis (eds Stocker, T. F. et al.) Ch. 9 (IPCC, Cambridge Univ. Press, 2013)], as we discuss below.Our goal here is to move beyond purely statistical aspects of the slowdown, and to focus instead on improving process understanding and assessing whether the observed trends are consistent with our expectations based on climate models.

(Emphasis and references added to original text. That’s from the Fyfe, et al‘s paper “Making sense of the the early-2000s warming slowdown”.)

Professor Cahill’s article is an entirely plausible interpretation of the datasets NOAA, GISTEMP, HadCRUT4, and BEST, from a statistician’s perspective. That perspective includes the idea that if there is no information below some level in a signal to explain, assigning an interpretation to that residual provides no support to the interpretation. In other words, if properly extracted warming trends are subtracted from warming data, it is no doubt possible to fit, say, an atmospheric model to the residual. But if the residual contains no information, there are many processes which will fit it as well, even if the processes do not have a physical science basis. It is a standard problem in Bayesian analysis to do inference or modeling using multiple models choices, each having a prior weight. In conventional presentations of Bayesian analysis, priors are typically reserved for parameters. Work has progress on analysis using mixture models (Stephens, 2000, The Annals of Statistics), that is, where the distribution governing a likelihood function is a linear mixture of several, simpler distributions. Putting priors on M models involves M sets of parameters, \boldsymbol\theta_{j} and a weight, \alpha_{j}, with one j for each of the models. Here 1 = \sum_{j=1}^{M} \alpha_{j}, and 0 \le \alpha_{j} \le 1. The resulting posterior of the Bayesian analysis would have an equilibrium assignment of mass to each of the \boldsymbol\theta_{j} and their corresponding \alpha_{j}. These calculations are done using Bayesian model averaging (“BMA”), known since 1999. (See also Prof Adrian Raftery’s page on the subject.)

Fragoso and Neto (2015) have provided a survey of relevant methods along with a conceptual classification. It is no surprise some scholars have applied these methods to climate work:

These are a good deal more than “just curve fitting”, and BMA has been available since 2000. Fang and Li have been cited just 4 times (Google Scholar), and Bhat, et al just 16, and Smith, et al have 173. But Raftery, et al has been cited 1029 times. The majority of these citations are specific applications of the techniques to particular regions. The assessment paper by Weigel, Knutti, Liniger, and Appenzeller (“Risks of model weighting in multimodel climate projections”, Journal of Climate, August 2010, 23) is odd in a couple of respects: They cite the Raftery, et al paper above, but they don’t specifically discuss it. They also seem to continue to associate Bayesian methods with subjectivism, and entertain roles for both Frequentist and Bayesian methods. (That makes no sense whatsoever.) It’s not clear if the discussion is restricted to ensembles of climate models, which I suspect, or is a criticism of a wider set of methods. I agree climate ensembles like CMIP5 share components among their members, so are not independent, but, if BMA is used, that oughtn’t matter. BMA is not bootstrapping. Knutti also wrote an odd comment in Climatic Change [2010, 102(3-4), 395-404] where he seemed to downplay a role for combinations of models. Again, I think it’s important not to equivocate. Knutti’s “rain tent” analogy

We intuitively assume that the combined information from multiple sources improves our understanding and therefore our ability to decide. Now having
read one newspaper forecast already, would a second and a third one increase your confidence? That seems unlikely, because you know that all newspaper forecasts are based on one of only a few numerical weather prediction models. Now once you have decided on a set of forecasts, and irrespective of whether they agree or not, you will have to synthesize the different pieces of information and decide about the tent for the party. The optimal decision probably involves more than just the most likely prediction. If the damage without the tent is likely to be large, and if putting up the tent is easy, then you might go for the tent in a case of large prediction uncertainty even if the most likely outcome is no rain.

might apply to certain applications of multi-member climate ensembles, but certainly does not apply to uses of BMA. Here, for example, the paper of Cahill might consider alternative models to be those having different numbers and placements of breakpoints in trends. A run of a BMA consistent of such alternatives would yield a weighting for each, which could be interpreted as a plausibility score. Similar things are done in Bayesian cluster analysis, where the affinity of a data point for a particular cluster is scored rather than an absolute commitment to its membership. Indeed, without such an approach, determining the number of breakpoints in trends is pretty much ad hoc guesswork. BMA does not mean a literal average of outcomes.

Professor Cahill, along with Rahmstorf and Foster have responded to the Fyfe, et al critique in their “Global temperature evolution: recent trends and some pitfalls” [Environmental Research Letters, 12 (2017) 054001]:

We discuss some pitfalls of statistical analysis of global temperatures which have led to incorrect claims of an unexpected or significant warming

Cahill’s paper is readable and approachable, as are most papers in Significance.

Other papers about this subject are listed below, most from this team:

(*) Knowledgeable, yes. Dated, also yes. While, according to my occasional inspections of the publications of the American Meteorological Society, people are using Bayesian methods and means of computation more frequently. That’s good. But they are not using it as much as, say, population biologists and field ecologists do. I also heard a put-down of data science and machine learning methods at a recent symposium, principally complaining about the opacity of models so derived. While surely techniques from these fields have their limitations, it’s not at all clear to me that an ensemble of climate models which have been run 1000 years in order to initialize them is any more transparent than a recurrent neural network. Moreover, the dearth of uses for Bayesian model averaging apart from the original authors and in applications, discussed in the text above, suggests a certain reticence in pursuing modern techniques.

About hypergeometric

See http://www.linkedin.com/in/deepdevelopment/ and https://667-per-cm.net/about
This entry was posted in American Statistical Association, Andrew Parnell, anomaly detection, Anthropocene, Bayesian, Bayesian model averaging, Berkeley Earth Surface Temperature project, BEST, climate change, David Spiegelhalter, dependent data, Dublin, GISTEMP, global warming, Grant Foster, HadCRUT4, hiatus, Hyper Anthropocene, JAGS, Markov Chain Monte Carlo, Martyn Plummer, Mathematics and Climate Research Network, MCMC, model-free forecasting, Niamh Cahill, Significance, statistics, Stefan Rahmstorf, Tamino. Bookmark the permalink.

2 Responses to Less evidence for a global warming hiatus, and urging more use of Bayesian model averaging in climate science

  1. The link to the rant at the Bayesian Statistics blog is interesting because I just had a comment deleted at Tamino’s “Open Mind”.

    If you can get some insight into the multimodel ensemble averages that would be great. They seem to be a mix of varying parameters and varying initial conditions, and perhaps random forcing thrown in as well? The latter two would likely average out for the sinusoidal response contributed by the ocean dipole indices.

  2. If you can get some insight into the multimodel ensemble averages that would be great.

    I’d check with the Fang and Li paper, per:

    result from Fang and Li, cited above

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s