I’ve encountered a number of blog posts this week which seem not to understand the Bias-Variance Tradeoff in regard to Mean-Squared-Error. These arose in connection with *smoothing splines*, which I was studying in connection with *multivariate adaptive regression splines*, that is actually something different than smoothing splines. (I will have a post here soon on *multivariate adaptive regression splines*, or the *earth procedure* as it’s called.)

The general notion some people seem to have is that smoothing splines throw away information and introduce correlation where there isn’t any, and it distorts scientific data. A particularly obnoxious example of this is at science denier William Briggs’ blog. Another, milder instance is at a blog post by a blogger called “Joseph” who specializes, he says, in “A closer look at scientific data and claims, with an emphasis on anthropogenic global warming.” I was going to put in a comment at the blog, but apparently comments there are closed, or at least no longer work. (So do some links to data from that post.) So, instead, I’m putting it here. I already answered a question at *Stats Stackexchange* which invoked Briggs.

Smoothing is not about making a picture nicer or losing information. It is about the bias-variance tradeoff. Given that minimizing *mean squared error* in fitting data with a non-parametric (or, for that matter, *any*) model is important, introducing a bias in a model, such as smoothing in a spline can reduce variability and, so, reduce overall mean squared error of a fit.

The Wikipedia page shows the connection with bias and variance, and the proof of their relationship.

It was an important finding by Stein in 1955, which gave rise to deliberately introducing some bias via things like James-Stein estimators in order to improve overall performance. Prior to Stein’s insight, classical statistics only considered unbiased estimators, and that insight showed that procedures like *maximum likelihood estimation* were not optimal, even if they work well a lot of the time.

And, accordingly, “Joseph”‘s criticism of the Law Dome CO_{2} data is not well founded. I bring his and the reader’s attention to a paper co-authored by Etheridge, one of the co-authors of the Law Dome work, about why smoothing splines are used.

Note mean-squared-error is disguised in various powerful measures of model fit, like the *Akaike Information Criterion*.

*Update*, 2016-12-27: Smooth, yes, but don’t *ever* expect to see the smoothed curve realized

While the smoothed version of a series can and often does provide an estimate with the least mean-squared-error, if properly chosen, it is a different question whether the presentation of such a smoothed curve is the best to convey the series, especially if communicating with the statistically uninitiated. The smoothed version of a curve is an idealization, intended for purposes of forecasting, or prediction (they are not the same), and sometimes for helping to tease out physical mechanisms giving rise to the observed phenomenon.

For one thing, the smoothed or idealized curve has *zero* probability of actually being realized, even on the span of support for which it is calculated. Actual realizations of the phenomenal or observed series will have excursions from the smooth guided by the distribution of its residuals, and *it is entirely a part of the series* to see these excursions and, moreover, expect that if (it were possible to draw) another realization of the series, there would be a different set of excursions applied.

For another, the general public does not seem to get the idea of a data series with random excursions atop a pattern, and appear to approach these matters as if they were entirely deterministic. That’s a very classical kind of notion: The Watchmaker’s Universe. In this view, the only reason why phenomena are not perfectly predicted is because we have but imperfect knowledge of the science involved, or of Nature, or something, and only a Deity knows these (notwithstanding the Deity knowing what all individuals will choose if Free Will is posited). A different view, more modern is that *even a Deity* cannot predict perfectly how another realization of these stochastic phenomena will play out.

So, the best way to communicate this variability to me is to present the observed data from the series, present the smoothed realization, and then present a cloud or ensemble of draws from the smoothed curve with excursions governed by residuals atop of it. For example,

**( Click on image to see larger figure, and use browser Back Button to return to blog.**

By the way, the example above shows *two* competing models for the smooth to the data.

If dependent data are to be emphasized, then using ensembles of tracks such as the reasonably famous hurricane tracks are useful:

**( Click on image to see larger figure, and use browser Back Button to return to blog.**

*Update*, 2017-10-14

Many indices of scientific interest are latent variables.

Pingback: Why scientific measurements need to be adjusted | Hypergeometric