There’s Big Data, Tiny Data, and now Dead Data

You’ve heard of Big Data. You may have heard of Tiny Data. But now, presented in the Harvard Data Science Review, Professor Steve Stigler presents

Dead Data

See:

S. M. Stigler, "Data have a limited shelf life", Harvard Data Science Review, November 2019.

Abstract

Data, unlike some wines, do not improve with age. The contrary view, that data are immortal, a view that may underlie the often-observed tendency to recycle old examples in texts and presentations, is illustrated with three classical examples and rebutted by further examination. Some general lessons for data science are noted, as well as some history of statistical worries about the effect of data selection on induction and related themes in recent histories of science.

Keywords: dead data, zombie data, post-selection inference, history

Of particular historical interest is whether or not modern scholars can ever properly interpret classic experiments, with their defects, like the Millikan oil drop experiment, or Eddington’s measurement of light deflection to confirm General Relativity.

Also of interest is whether enough metadata about old datasets in business, such as insurance or operations, or even scientific observation, is kept to be able to properly reconstruct the provenance.

Hat tip to Professor Christian Robert for pointing out this article at his blog.

About ecoquant

See https://wordpress.com/view/667-per-cm.net/ Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study.
This entry was posted in big data, dead data, statistics, tiny data. Bookmark the permalink.

Leave a Reply