## Confidence intervals and that IPCC: Why climate scientists need statistical help

At Andrew Gelman’s blog (Statistical Modeling, Causal Inference, and Social Science), Ben Goodrich makes the interesting observation in a length discussion about confidence intervals, how they should be interpreted, whether or not they have any socially redeeming value, und so weiter. Dr Goodrich zings the opening paragraphs of the Summary for Policymakers from the AR5 report of the Intergovernmental Panel on Climate Change in its treatment of confidence intervals:

My current favorite example of the potential damage of confidence intervals is from the Summary for Policymakers of the Intergovernmental Panel on Climate Change.

To take one example, the first real paragraph says:

“The period from 1983 to 2012 was _likely_ the warmest 30-year period of the last 1400 years in the Northern Hemisphere, where such assessment is possible (_medium confidence_). The globally averaged combined land and ocean surface temperature data as calculated by a linear trend show a warming of 0.85 [0.65 to 1.06] °C {^2} over the period 1880 to 2012, when multiple independently produced datasets exist.”

The second footnote actually gets the definition of a confidence interval correct, albeit in a way that only a well-trained statistician would understand:

{^2}: Ranges in square brackets or following ‘±’ are expected to have a 90% likelihood of including the value that is being estimated

So, basically they are correctly saying to statisticians “The pre-data expectation of the indicator function as to whether the estimated confidence interval includes the true average temperature change is 0.9” and incorrectly saying to everyone else “there is a 0.9 probability that the average temperature rose between 0.65 and 1.06 degrees Celsius between 1880 and 2012”. I am hesitant to say it is okay for policymakers adopt the latter misinterpretation because they would misinterpret the former interpretation.

I think your main point comes down to what the alternative is. If confidence intervals are inevitable, then I guess it would be less damaging for people to interpret them incorrectly than correctly. But if confidence intervals can be replaced by Bayesian intervals and interpreted correctly, I think that would be preferable.

The report gets more convoluted in its attempt to simultaneously be useful to policymakers and not wrong statistically. The first footnote says

{^1}: Each finding is grounded in an evaluation of underlying evidence and agreement. In many cases, a synthesis of evidence and agreement supports an assignment of confidence. The summary terms for evidence are: limited, medium or robust. For agreement, they are low, medium or high. A level of confidence is expressed using five qualifiers: very low, low, medium, high and very high, and typeset in italics, e.g.,
_medium confidence_. The following terms have been used to indicate the assessed likelihood of an outcome or a result: virtually certain 99-100% probability, very likely 90-100%, likely 66-100%, about as likely as not 33-66%, unlikely 0-33%, very unlikely 0-10%, exceptionally unlikely 0-1%. Additional terms (extremely likely 95-100%, more likely than not >50-100%, more unlikely than likely 0-<50%, extremely unlikely 0-5%) may also be used when appropriate. Assessed likelihood is typeset in italics, e.g., _very likely_.

So, they are using the word "confidence" not in the technical sense of a "confidence interval" but to describe the degree of agreement among the scientists who wrote the report on the basis of the (presumably frequentist) studies they reviewed. And then they have a seemingly Bayesian interpretation of the numerical probability of events being true but without actually using Bayesian machinery to produce posterior expectations. If they had asked me (which they didn't), I would have said to just do Bayesian calculations and justify the priors and whatnot in the footnotes instead of writing this mess.

Now, the IPCC is no more special in this kind of abuse of terminology than the American Medical Association is. And, yes, maybe if statisticians helped them out, the Policymakers would fall asleep on page three. Still, people oughtn’t get to pick their own standards of evidence.

There’s a fair bit to digest there and many pithy remarks elsewhere. For example, Corey Yanofsky quotes work by Bickel and remarks

Also, confidence procedures can be consistent with the betting operationalization you’re discussing (https://arxiv.org/abs/0907.0139). You need a diachronic Dutch book to force full Bayes.

I did not realize there was a philosophy based upon Dutch books.