“Ten Fatal Flaws in Data Analysis” (Charles Kufs)

Professor Kufs has a fun book, Stats with Cats, and a blog. He also has a blog post tiled “Ten Fatal Flaws in Data Analysis” which, in general, I like. But the presentation has some shortcomings, too, which I note below.

  1. Where’s the Beef? Essentially, there’s no analysis. There’s a data summary.
  2. Phantom Populations Samples need to represent the population of interest. There has to be a population of interest. They need to have something specific in common which, if absent, has a good chance of affecting an outcome.
  3. Wow, Sham Samples The population is real, but the samples don’t represent it well, or at all. But be careful, here! I don’t think Kufs emphasizes this enough: A sample need not contain observation of population groups in the same proportions with which they occur in the population. Professor Yves Tillé makes this point strongly in his book, Sampling Algorithms.
  4. Enough is Enough No confidence and statistical power with too few, and no meaning with too many. I’d add, underscoring something Kufs says, be sure each category has a fair number of observations as well.
  5. Indulging Variance Unless variance is assessed and reported, an analysis is not statistical and it’s not scientific. Means mean muck without accompanying reports of variability. There’s a lot to appreciate about variability. Properly assessed, variability can be an important tool, as it is capable of separating out subpopulations which otherwise have common means. Ignoring heteroscedasticity can break many a standard analytical tool. The place to start dealing with variance when studying a population is at the sampling plan. There’s often a tradeoff between low variance and low bias. Merely calculating and reporting a standard deviation at the end of an analysis is almost always insufficient treatment.
  6. Madness to the Methods Data inspection, cleaning, correcting, and testing for assumptions of modeling are unglamorous, tedious, and time consuming parts of any statistical analysis or data science project. It is also a portion which is hard to defend, since, in industry, management is sometimes impatient to see results coming from an allocation of expensive people and resources. But these steps are totally necessary. Without the last step, testing, you cannot know if the data are sufficiently cleaned or representative.
  7. Torrents of Tests Kufs treatment of the multiple testing problem is old (Bonferroni), whether addressed from a quasi-Frequentist perspective or the more modern Bayesian one. There are now techniques for controlling family-wise error rates when large numbers of tests are conducted. Bayesian methods don’t have a problem with multiple comparisons. That’s one reason why I use them (a lot).
  8. Significant Insignificance and Insignificant Significance Ah, significance tests! I could go on and on about these, and often have. The kindest thing to say here is a quote from Jerome Cornfield (1976): “The most general and concise way of saying all this is that p-values depend on both the x observed and on the other possible values of x that might have been observed but [were not], i.e., the sample space, while the likelihood ratio depends only on the observed x.”
  9. Extrapolation Intoxication This is a caution against making the same mistake NASA and its subcontractors did when making the fateful decision to launch the Space Shuttle Challenger after it was exposed to freezing temperatures.
  10. Misdirected Models Models are critical for understanding data, whether they are derived from domain knowledge, like physical theory, or not. There are many ways theories or hypotheses upon which models are based can themselves be wrong. The most important criterion a theory or hypothesis needs to have is it must be falsifiable. Extending, then, every model much have diagnostics within it which tell when the model is broken. This can be inherent to the application of the model, or it can be done by comparing its performance with a straw man model which is completely nonsensical but is fit to the same data. However, in this day and age there are such things as non-mechanistic empirical dynamic modeling which have shown success in the absence of underlying theory.

In practice, I witness projects running aground on these sandbars way too often. Kufs’ cautions are good. But Kufs’ advice could use an update.

The above is taken from S. R. Dalal , E. B. Fowlkes, B. Hoadley, “Risk analysis of the Space Shuttle: Pre-Challenger prediction of failure”, 1989, Journal of the American Statistical Association, 84(408), 945-957, DOI: 10.1080/01621459.1989.10478858.

About ecoquant

See http://www.linkedin.com/in/deepdevelopment/ and https://667-per-cm.net/about
This entry was posted in Bayesian, Bayesian computational methods, Charlie Kufs, George Sugihara, sampling, sampling algorithms, statistics, yves tille. Bookmark the permalink.

Leave a reply. Commenting standards are described in the About section linked from banner.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.