Category Archives: statistics

What is the Tukey loss function?

Originally posted on Statistical Odds & Ends:
The Tukey loss function The Tukey loss function, also known as Tukey’s biweight function, is a loss function that is used in robust statistics. Tukey’s loss is similar to Huber loss in that…

Posted in loss functions, optimization, statistics | Leave a comment

a song in praise of data scientist Rebekah Jones

I linked to Rebekah Jones‘ keynote address at the August 2020 Data Science Conference on COVID-19 sponsored by the National Institute for Statistical Science. Below is a song in tribute to her, wishing her well. (h/t Bill McKibben) We’re doing … Continue reading

Posted in American Association for the Advancement of Science, American Mathematical Society, American Statistical Association, Boston Ethical Society, children as political casualties, Data for Good, data science, geographic, geographic information systems, International Society for Bayesian Statistics, journalism, mathematics, New England Statistical Society, pandemic, Rebekah Jones, Risky Talk, science, Significance, statistical ecology, statistics, the problem of evil, whistleblowing, ``The tide is risin'/And so are we'' | Leave a comment

Rationalists, wearing square hats, Think, in square rooms, Looking at the floor, Looking at the ceiling. They confine themselves To right-angled triangles. If they tried rhomboids, Cones, waving lines, ellipses– As, for example, the ellipse of the half- moon– Rationalists … Continue reading

Posted in statistics | Leave a comment

Calculating Derivatives from Random Forests

(Comment on prediction intervals for random forests, and links to a paper.) (Edits to repair smudges, 2020-06-28, about 0945 EDT. Closing comment, 2020-06-30, 1450 EDT.) There are lots of ways of learning about mathematical constructs, even about actual machines. One … Continue reading

Posted in bridge to somewhere, Calculus, dependent data, dynamic generalized linear models, dynamical systems, ensemble methods, ensemble models, filtering, forecasting, hierarchical clustering, linear regression, model-free forecasting, Monte Carlo Statistical Methods, non-mechanistic modeling, non-parametric model, non-parametric statistics, numerical algorithms, prediction, R statistical programming language, random forests, regression, sampling, splines, statistical learning, statistical series, statistics, time derivatives, time series | Leave a comment

ClimateAdam on ClimateJustice

Listen!

Posted in statistics | Leave a comment

COVID-19 statistics, a caveat : Sources of data matter

There are a number of sources of COVID-19-related demographics, cases, deaths, numbers testing positive, numbers recovered, and numbers testing negative available. Many of these are not consistent with one another. One could hope at least rates would be consistent, but … Continue reading

Posted in coronavirus, count data regression, COVID-19, descriptive statistics, epidemiology, pandemic, policy metrics, politics, population biology, population dynamics, quantitative biology, quantitative ecology, sampling, SARS-CoV-2, statistical ecology, statistical series, statistics | 2 Comments

“There’s mourning in America”

“We are Republicans and we want Trump defeated.” And the Orange Mango apparently hates this advert. And that’s why it’s here. The Lincoln Project apparently introduced this advert on Twitter with the explanatory text: Since you are awake and trolling … Continue reading

Posted in statistics | Leave a comment

“Seasonality of COVID-19, Other Coronaviruses, and Influenza” (from Radford Neal’s blog)

Thorough review with documentation and technical criticism of claims of COVID-19 seasonality or its lack. Whichever way this comes down, the links are well worth the visit! Will the incidence of COVID-19 decrease in the summer? There is reason to … Continue reading

Posted in COVID-19, differential equations, diffusion, diffusion processes, epidemiology, Lotka-Volterra systems, meteorology, pandemic, population biology, population dynamics, Radford Neal, SARS-CoV-2, statistics | Leave a comment

Simplistic and Dangerous Models

Originally posted on Musings on Quantitative Palaeoecology:
A few weeks ago there were none. Three weeks ago, with an entirely inadequate search strategy, ten cases were found. Last Saturday there were 43! With three inaccurate data points, there is enough information…

Posted in Generalized Additive Models, non-parametric statistics, science, statistics | Leave a comment

“Lockdown WORKS”

Originally posted on Open Mind:
Over 2400 Americans died yesterday from Coronavirus. Here are the new deaths per day (“daily mortality”) in the USA since March 10, 2020 (note: this is an exponential plot) As bad as that news is,…

Posted in forecasting, penalized spline regression, science, splines, statistical regression, statistical series, statistics, time series | 1 Comment

What happens when time sampling density of a series matches its growth

This is the newly updated map of COVID-19 cases in the United States, updated, presumably, because of the new emphasis upon testing: How do we know this is the recent of recent testing? Look at the map of active cases: … Continue reading

Posted in American Association for the Advancement of Science, American Statistical Association, anti-intellectualism, anti-science, climate denial, corruption, data science, data visualization, Donald Trump, dump Trump, epidemiology, experimental science, exponential growth, forecasting, Kalman filter, model-free forecasting, nonlinear systems, open data, penalized spline regression, population dynamics, sampling algorithms, statistical ecology, statistical models, statistical regression, statistical series, statistics, sustainability, the right to know, the stack of lies | 1 Comment

“Code for causal inference: Interested in astronomical applications”

via Code for causal inference: Interested in astronomical applications From Professor Ewan Cameron at his Another Astrostatistics Blog.

Posted in American Association for the Advancement of Science, American Statistical Association, astronomy, astrostatistics, causal inference, causation, counterfactuals, epidemiology, experimental design, experimental science, multivariate statistics, prediction, propensity scoring, quantitative biology, quantitative ecology, reproducible research, rhetorical mathematics, rhetorical science, rhetorical statistics, science, statistical ecology, statistical models, statistical regression, statistics | Leave a comment

Reanalysis of business visits from deployments of a mobile phone app

Updated, 20th October 2020 This reports a reanalysis of data from the deployment of a mobile phone app, as reported in: M. Yauck, L.-P. Rivest, G. Rothman, “Capture-recapture methods for data on the activation of applications on mobile phones“, Journal … Continue reading

Posted in Bayesian computational methods, biology, capture-mark-recapture, capture-recapture, Christian Robert, count data regression, cumulants, diffusion, diffusion processes, Ecological Society of America, ecology, epidemiology, experimental science, field research, Gibbs Sampling, Internet measurement, Jean-Michel Marin, linear regression, mark-recapture, mathematics, maximum likelihood, Monte Carlo Statistical Methods, multilist methods, multivariate statistics, non-mechanistic modeling, non-parametric statistics, numerics, open source scientific software, Pierre-Simon Laplace, population biology, population dynamics, quantitative biology, quantitative ecology, R, R statistical programming language, sampling, sampling algorithms, segmented package in R, statistical ecology, statistical models, statistical regression, statistical series, statistics, stepwise approximation, stochastic algorithms, surveys, V. M. R. Muggeo | 1 Comment

There’s Big Data, Tiny Data, and now Dead Data

You’ve heard of Big Data. You may have heard of Tiny Data. But now, presented in the Harvard Data Science Review, Professor Steve Stigler presents Dead Data See: S. M. Stigler, “Data have a limited shelf life”, Harvard Data Science … Continue reading

Posted in big data, dead data, statistics, tiny data | Leave a comment

“Tensors in Algebraic Statistics” (Elizabeth Gross)

Professor Elizabeth Gross. Some notes: Segre variety, about (These will be updated as I make progress through the talk.)

Posted in statistics, tensors | Leave a comment

Review of “No … increase of Carbon sequestration from the greening Earth”

(As promised.) Introduction and Abstract This is a review, re-presentation, and report on the August 2019 article, Y. Zhang, C. Song, L. E. Band, G. Sun, (2019), “No proportional increase of terrestrial gross Carbon sequestration from the greening Earth“, Journal … Continue reading

Posted in adaptation, afforestation, agriculture, agroecology, algal blooms, American Statistical Association, argoecology, being carbon dioxide, biology, Botany, bridge to somewhere, carbon dioxide, carbon dioxide sequestration, chemistry, citizen science, clear air capture of carbon dioxide, climate, climate data, climate disruption, climate economics, climate mitigation, di-nitrogen oxide, ecocapitalism, ecological disruption, Ecological Society of America, ecomodernism, ecopragmatism, environment, evidence, food, forests, fossil fuels, geophysics, Glen Peters, Global Carbon Project, greenhouse gases, James Hansen, John Holdren, p-value, phytoplankton, pollution, population biology, quantitative biology, resource producitivity, scholarship, science education, significance test, statistics, Steven Chu, sustainability, sustainable landscaping, wishful environmentalism | 1 Comment

“Bayesian replication analysis” (by John Kruschke)

“… the ability to express [hypotheses] as distributions over parameters …” Bayesian estimation supersedes the t-test: (Also by Professor Kruschke.)

Posted in American Statistical Association, Bayesian, John Kruschke, model comparison, rationality, rhetorical statistics, statistical models, statistics, Student t distribution | Leave a comment

“Ten Fatal Flaws in Data Analysis” (Charles Kufs)

Professor Kufs has a fun book, Stats with Cats, and a blog. He also has a blog post tiled “Ten Fatal Flaws in Data Analysis” which, in general, I like. But the presentation has some shortcomings, too, which I note … Continue reading

Posted in Bayesian, Bayesian computational methods, Charlie Kufs, George Sugihara, sampling, sampling algorithms, statistics, yves tille | Leave a comment

A response to a post on RealClimate

(Updated 2342 EDT, 28 June 2019.) This is a response to a post on RealClimate which primarily concerned economist Ross McKitrick’s op-ed in the Financial Post condemning the geophysical community for disregarding Roger Pielke, Jr’s arguments. Pielke, in that link, … Continue reading

Posted in American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, Bayesian, climate change, ecology, Ecology Action, environment, evidence, experimental design, Frequentist, global warming, Hyper Anthropocene, machine learning, model comparison, model-free forecasting, multivariate statistics, science, science denier, statistical series, statistics, time series | Leave a comment

Cumulants and the Cornish-Fisher Expansion

“Consider the following.” (Bill Nye the Science Guy) There are random variables drawn from the same kind of probability distribution, but with different parameters for each. In this example, I’ll consider random variables , that is, each drawn from a … Continue reading

Posted in Calculus, closed-form expressions, Cornish-Fisher expansion, cumulants, descriptive statistics, mathematics, maths, multivariate statistics, statistical models, statistics, theoretical statistics | Leave a comment

What’s good for each subgroup can be bad for the group: Simpson’s

Why? Simpson’s “paradox” or observation … There’s actually nothing odd about this. While interpretation depends upon the semantics of individual measurements, it should be expected that, at times, improving things for the overall group will mean as a matter of … Continue reading

Posted in abstraction, statistics | Leave a comment

California Marine Debris Prevention: Banning Plastic Bags is Not Enough

NOAA has a full page of videos on marine debris and how to prevent it. The state of California has a 2018 plan on preventing marine debris. Here are some highlights. There is a good deal more in the report, … Continue reading

Posted in American Statistical Association, Life Cycle Assessment, life cycle sustainability analysis, policy metrics, public welfare, shop, shorelines, solid waste, solid waste management, South Shore Recycling Cooperative, spatial statistics, statistical series, statistics, supply chains, sustainability, the right to know, wishful environmentalism | Leave a comment

Five Thirty Eight podcast: `Can Statistics solve gerrymandering?`

Great podcast, featuring Professor and geometer Moon Duchin, Nate Silver, and Galen Druke. If the link doesn’t work, listen from here or below: Professor Duchin has written extensively on this: M. Duchin, B. E. Tenner, “Discrete geometry for electoral geography”, … Continue reading

Posted in FiveThirtyEight, Nate Silver, point pattern analysis, politics, statistics | Leave a comment

On bag bans and sampling plans

Plastic bag bans are all the rage. It’s not the purpose of this post to take a position on the matter. Before you do, however, I’d recommend checking out this: and especially this: (Note: My lovely wife, Claire, presents this … Continue reading

Posted in bag bans, citizen data, citizen science, Commonwealth of Massachusetts, Ecology Action, evidence, Google, Google Earth, Google Maps, goverance, lifestyle changes, microplastics, municipal solid waste, oceans, open data, planning, plastics, politics, pollution, public health, quantitative ecology, R, R statistical programming language, reasonableness, recycling, rhetorical statistics, sampling, sampling networks, statistics, surveys, sustainability | 2 Comments

Repeating Bullshit

Originally posted on Open Mind:
Question: How does a dumb claim go from just a dumb claim, to accepted canon by the climate change denialati? Answer: Repetition. Yes, keep repeating it. If it’s contradicted by evidence, ignore that or insult…

Posted in American Statistical Association, anomaly detection, changepoint detection, climate change, Grant Foster, Mathematics and Climate Research Network, maths, science, statistics, Tamino, time series, unreason | Leave a comment

A look at an electricity consumption series using SNCDs for clustering

(Slightly amended with code and data link, 12th January 2019.) Prediction of electrical load demand or, in other words, electrical energy consumption is important for the proper operation of electrical grids, at all scales. RTOs and ISOs forecast demand based … Continue reading

Posted in American Statistical Association, consumption, data streams, decentralized electric power generation, dendrogram, divergence measures, efficiency, electricity, electricity markets, energy efficiency, energy utilities, ensembles, evidence, forecasting, grid defection, hierarchical clustering, hydrology, ILSR, information theoretic statistics, local self reliance, Massachusetts, microgrids, NCD, normalized compression divergence, numerical software, open data, prediction, rate of return regulation, Sankey diagram, SNCD, statistical dependence, statistical series, statistics, sustainability, symmetric normalized compression divergence, time series | 2 Comments

Series, symmetrized Normalized Compressed Divergences and their logit transforms

(Major update on 11th January 2019. Minor update on 16th January 2019.) On comparing things The idea of a calculating a distance between series for various purposes has received scholarly attention for quite some time. The most common application is … Continue reading

Posted in Akaike Information Criterion, bridge to somewhere, computation, content-free inference, data science, descriptive statistics, divergence measures, engineering, George Sughihara, information theoretic statistics, likelihood-free, machine learning, mathematics, model comparison, model-free forecasting, multivariate statistics, non-mechanistic modeling, non-parametric statistics, numerical algorithms, statistics, theoretical physics, thermodynamics, time series | 4 Comments

Why Americans and Britons work such long hours

Why Americans and Britons work such long hours.

Posted in business, economics, labor, statistics | Leave a comment

667-per-cm.net, the Podcast: Episode 2, or Probability is Real.

This is the second installment of the Podcast here, hopefully with better sound quality.

Posted in probability, random walks, statistics | Leave a comment

`significance testing`

Posted in American Statistical Association, attribution, Bayes, probability, statistics | Leave a comment