(This post has been significantly updated midday 15th February 2018.)
I’ve written about the supposed global warming hiatus of 2001-2014 before:
- “‘Overestimated global warming over the past 20 years’ (Fyfe, Gillett, Zwiers, 2013)”, 28 August 2013
- “Warming Slowdown?”, Azimuth, Part 1 (29 May 2014), and Part 2 (5 June 2014)
- “A conclusion that ‘the hiatus’ in global land surface warming is natural variability”, 5 November 2014
The current issue of the joint publication Significance from the Royal Statistical Society and the American Statistical Association has a nice paper by Professor Niamh Cahill of University College, Dublin. Professor Cahill is a colleague of Professor Stefan Rahmstorf, Dr Grant Foster (“Tamino”), and Professor Andrew Parnell. (Parnell is also from University College.) I’ll list a related history of their papers in a moment.
It’s good to see climate science and data treated well by statisticians, even though many geophysicists, oceanographers, and atmospheric scientists know something about statistics and data analysis (*). There is this paper by Professor Cahill, and the November 2017 issue of CHANCE was devoted to the subject. This is great, because the relationship between professional statisticians and climate scientists has been rocky at times. Notice some of the comments here and this rant. There is also some distrust of statistical methods from the geophysics side or, at least, from some atmospheric scientists. The great Jule Charney reportedly dismissed an analysis once by dubbing it “just curve fitting”, since the standard in his field was ab initio physics. And squarely within the margins of the present discussion, there is this gentle admonition from Drs Fyfe, Meehl, England, Mann, Santer, Flato, Hawkins, Gillett, Xie, Kosaka, and Swart that
The warming slowdown as a statistically robust phenomenon has also been questioned. Recent studies have assessed whether or not trends during the slowdown are statistically different from trends over some earlier period. These investigations have led to statements such as “further evidence against the notion of a recent warming hiatus” [Karl, T. R. et al. Science 348, 1469–1472 (2015)] or “claims of a hiatus in global warming lack sound scientific basis” [Rajaratnam, B., Romano, J., Tsiang, M. & Diffenbaugh, N. S. Climatic Change 133, 129–140 (2015)]. While these analyses are statistically sound, they benchmark the recent slowdown against a baseline period that includes times with a lower rate of increase in greenhouse forcing [Flato, G. et al. in Climate Change 2013: The Physical Science Basis (eds Stocker, T. F. et al.) Ch. 9 (IPCC, Cambridge Univ. Press, 2013)], as we discuss below.Our goal here is to move beyond purely statistical aspects of the slowdown, and to focus instead on improving process understanding and assessing whether the observed trends are consistent with our expectations based on climate models.
(Emphasis and references added to original text. That’s from the Fyfe, et al‘s paper “Making sense of the the early-2000s warming slowdown”.)
Professor Cahill’s article is an entirely plausible interpretation of the datasets NOAA, GISTEMP, HadCRUT4, and BEST, from a statistician’s perspective. That perspective includes the idea that if there is no information below some level in a signal to explain, assigning an interpretation to that residual provides no support to the interpretation. In other words, if properly extracted warming trends are subtracted from warming data, it is no doubt possible to fit, say, an atmospheric model to the residual. But if the residual contains no information, there are many processes which will fit it as well, even if the processes do not have a physical science basis. It is a standard problem in Bayesian analysis to do inference or modeling using multiple models choices, each having a prior weight. In conventional presentations of Bayesian analysis, priors are typically reserved for parameters. Work has progress on analysis using mixture models (Stephens, 2000, The Annals of Statistics), that is, where the distribution governing a likelihood function is a linear mixture of several, simpler distributions. Putting priors on models involves sets of parameters, and a weight, , with one for each of the models. Here , and . The resulting posterior of the Bayesian analysis would have an equilibrium assignment of mass to each of the and their corresponding . These calculations are done using Bayesian model averaging (“BMA”), known since 1999. (See also Prof Adrian Raftery’s page on the subject.)
Fragoso and Neto (2015) have provided a survey of relevant methods along with a conceptual classification. It is no surprise some scholars have applied these methods to climate work:
- K. S. Bhat, M. Haran, A. Terando, K. Keller, “Climate projections Using Bayesian Model Averaging and space–time dependence”, Journal of Agricultural, Biological, and Environmental Statistics, (2011) 16:606.
- A. E. Raftery, T. Gneiting, F. Balabdaoui, M. Polakowski, “Using Bayesian Model Averaging to Calibrate Forecast Ensembles”, Monthly Weather Review, May 2005, 133, 1105-1174.
- M. Fang, X. Li, “Application of Bayesian Model Averaging in the reconstruction of past climate change using PMIP3/CMIP5 multimodel ensemble simulations”, Journal of Climate, January 2016, 29:175-189.
- R. L. Smith, C. Tebaldi, D. Nycha, L. O. Mearns, “Bayesian modeling of uncertainty
in ensembles of climate models, Journal of the American Statistical Association, March 2009, 104(485), 97-116. - A. E. Raftery, Y. Zheng, “Discussion: Performance of Bayesian Model Averaging”, in response to N. L. Hjort, G. Claeskens, “Frequentist model average rstimators”, Journal of the American Statistical Association, 98(464), December 2003.
These are a good deal more than “just curve fitting”, and BMA has been available since 2000. Fang and Li have been cited just 4 times (Google Scholar), and Bhat, et al just 16, and Smith, et al have 173. But Raftery, et al has been cited 1029 times. The majority of these citations are specific applications of the techniques to particular regions. The assessment paper by Weigel, Knutti, Liniger, and Appenzeller (“Risks of model weighting in multimodel climate projections”, Journal of Climate, August 2010, 23) is odd in a couple of respects: They cite the Raftery, et al paper above, but they don’t specifically discuss it. They also seem to continue to associate Bayesian methods with subjectivism, and entertain roles for both Frequentist and Bayesian methods. (That makes no sense whatsoever.) It’s not clear if the discussion is restricted to ensembles of climate models, which I suspect, or is a criticism of a wider set of methods. I agree climate ensembles like CMIP5 share components among their members, so are not independent, but, if BMA is used, that oughtn’t matter. BMA is not bootstrapping. Knutti also wrote an odd comment in Climatic Change [2010, 102(3-4), 395-404] where he seemed to downplay a role for combinations of models. Again, I think it’s important not to equivocate. Knutti’s “rain tent” analogy
We intuitively assume that the combined information from multiple sources improves our understanding and therefore our ability to decide. Now having
read one newspaper forecast already, would a second and a third one increase your confidence? That seems unlikely, because you know that all newspaper forecasts are based on one of only a few numerical weather prediction models. Now once you have decided on a set of forecasts, and irrespective of whether they agree or not, you will have to synthesize the different pieces of information and decide about the tent for the party. The optimal decision probably involves more than just the most likely prediction. If the damage without the tent is likely to be large, and if putting up the tent is easy, then you might go for the tent in a case of large prediction uncertainty even if the most likely outcome is no rain.
might apply to certain applications of multi-member climate ensembles, but certainly does not apply to uses of BMA. Here, for example, the paper of Cahill might consider alternative models to be those having different numbers and placements of breakpoints in trends. A run of a BMA consistent of such alternatives would yield a weighting for each, which could be interpreted as a plausibility score. Similar things are done in Bayesian cluster analysis, where the affinity of a data point for a particular cluster is scored rather than an absolute commitment to its membership. Indeed, without such an approach, determining the number of breakpoints in trends is pretty much ad hoc guesswork. BMA does not mean a literal average of outcomes.
Professor Cahill, along with Rahmstorf and Foster have responded to the Fyfe, et al critique in their “Global temperature evolution: recent trends and some pitfalls” [Environmental Research Letters, 12 (2017) 054001]:
We discuss some pitfalls of statistical analysis of global temperatures which have led to incorrect claims of an unexpected or significant warming
slowdown.
Cahill’s paper is readable and approachable, as are most papers in Significance.
Other papers about this subject are listed below, most from this team:
- G. Foster, S. Rahmstorf, “Global temperature evolution 1979–2010”, Environmental Research Letters, 6 (2011) 044022. R. E. Benestad provided a perspective for this featured article (“Reconciliation of global temperatures”).
- N. Cahill, S. Rahmstorf, A.C. Parnell, “Change points in global temperature”, Environmental Research Letters, 10 (2015) 084002. This paper addresses the same problem as Cahill’s Significance article, but does it within a Bayesian framework using Markov Chain Monte Carlo and JAGS.
- S. Rahmstorf, G. Foster, N. Cahill, “Global temperature evolution: recent trends and some pitfalls”, Environmental Research Letters, 12 (2017) 054001. Their Section 3 (“Pitfalls in tests for trend change”) is similar to the Section 5 of my second articles on the subject at Azimuth (“Trends are Tricky”).
- I. Medhaug, M. B. Stolpe, E. M. Fischer, R. Knutti, “Reconciling controversies about the ‘global warming hiatus’”, Nature, 545, 41-47 (04 May 2017).
I’d check with the Fang and Li paper, per:
The link to the rant at the Bayesian Statistics blog is interesting because I just had a comment deleted at Tamino’s “Open Mind”.
If you can get some insight into the multimodel ensemble averages that would be great. They seem to be a mix of varying parameters and varying initial conditions, and perhaps random forcing thrown in as well? The latter two would likely average out for the sinusoidal response contributed by the ocean dipole indices.