Naomi Oreskes and significance testing

Naomi Oreskes has an op-ed in The New York Times today, which intends to defend the severe standards of evidence scientists employ, with special applicability to climate science and their explanation of causation (greenhouse gases produce radiative forcing), attribution (most of the increase in greenhouse gases is due to human sources, principally extraction and burning of fossil fuels), and effects (continuing emissions without drastic mitigation will produce severe health, social, economic, political, and military effects). Alas, Dr Oreskes centers her argument on significance testing and, in doing so, from one perspective, weakens her illustration to the point of evaporation and, from another, emphasizes and underscores a mode of statistical thought which is downright harmful, especially in medicine.

Let’s take an example discussed by Westover, Westover, and Bianchi. (See also Briggs and Goodman (1) and Goodman (2).)

Consider a typical medical research study, for example designed to test the efficacy of a drug, in which a null hypothesis H0 ('no effect') is tested against an alternative hypothesis H1 ('some effect'). Suppose that the study results pass a test of statistical significance (that is P-value <0.05) in favor of H1. What has been shown?

1. H0 is false.
2. H1 is true.
3. H0 is probably false.
4. H1 is probably true.
5. Both (1) and (2).
6. Both (3) and (4).
7. None of the above.

This was actually posed as part of a study conducted by Westover, Westover, and Bianchi regarding such inference among medical specialists. (See their paper for details.) They obtained these results in their sample:
Table 1
Quiz answer profile.
Answer (1) (2) (3) (4) (5) (6) (7)
Number 8 0 58 37 6 69 12
Percent 4.2 0 30.5 19.5 3.2 36.3 6.3

Lest medical scientists appear to be singled out, use of such tests is also common in climate geophysics. The prevalence of such statistical practice is one reason why Ioannidis found “Why most published research findings are false“.

What’s the answer? (7) or None of the above. As Westover, Westover, and Bianchi’s Figure 8 shows, simply stating the fact that a 0.05 significance test resulted in a p-value less than 0.05 tells very little about what it means. That determination also demands knowing the pre-test probability of the event (or its prior) and the statistical power of the test, otherwise described as the probability that the data favor an alternative hypothesis given that the alternative is true. These were not specified, so the conclusion is indeterminate.

About ecoquant

See Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study.
This entry was posted in Bayes, Bayesian, citizen science, climate, climate education, mathematics, mathematics education, maths, model comparison, rationality, reasonableness, science, statistics, testing. Bookmark the permalink.

Leave a reply. Commenting standards are described in the About section linked from banner.

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.