Category Archives: sampling

Calculating Derivatives from Random Forests

(Comment on prediction intervals for random forests, and links to a paper.) (Edits to repair smudges, 2020-06-28, about 0945 EDT. Closing comment, 2020-06-30, 1450 EDT.) There are lots of ways of learning about mathematical constructs, even about actual machines. One … Continue reading

Posted in bridge to somewhere, Calculus, dependent data, dynamic generalized linear models, dynamical systems, ensemble methods, ensemble models, filtering, forecasting, hierarchical clustering, linear regression, model-free forecasting, Monte Carlo Statistical Methods, non-mechanistic modeling, non-parametric model, non-parametric statistics, numerical algorithms, prediction, R statistical programming language, random forests, regression, sampling, splines, statistical learning, statistical series, statistics, time derivatives, time series | Leave a comment

COVID-19 statistics, a caveat : Sources of data matter

There are a number of sources of COVID-19-related demographics, cases, deaths, numbers testing positive, numbers recovered, and numbers testing negative available. Many of these are not consistent with one another. One could hope at least rates would be consistent, but … Continue reading

Posted in coronavirus, count data regression, COVID-19, descriptive statistics, epidemiology, pandemic, policy metrics, politics, population biology, population dynamics, quantitative biology, quantitative ecology, sampling, SARS-CoV-2, statistical ecology, statistical series, statistics | 1 Comment

Reanalysis of business visits from deployments of a mobile phone app

This reports a reanalysis of data from the deployment of a mobile phone app, as reported in: M. Yauck, L.-P. Rivest, G. Rothman, “Capture-recapture methods for data on the activation of applications on mobile phones“, Journal of the American Statistical … Continue reading

Posted in Bayesian computational methods, biology, capture-mark-recapture, capture-recapture, Christian Robert, count data regression, cumulants, diffusion, diffusion processes, Ecological Society of America, ecology, epidemiology, experimental science, field research, Gibbs Sampling, Internet measurement, Jean-Michel Marin, linear regression, mark-recapture, mathematics, maximum likelihood, Monte Carlo Statistical Methods, multilist methods, multivariate statistics, non-mechanistic modeling, non-parametric statistics, numerics, open source scientific software, Pierre-Simon Laplace, population biology, population dynamics, quantitative biology, quantitative ecology, R, R statistical programming language, sampling, sampling algorithms, segmented package in R, statistical ecology, statistical models, statistical regression, statistical series, statistics, stepwise approximation, stochastic algorithms, surveys, V. M. R. Muggeo | 1 Comment

“Ten Fatal Flaws in Data Analysis” (Charles Kufs)

Professor Kufs has a fun book, Stats with Cats, and a blog. He also has a blog post tiled “Ten Fatal Flaws in Data Analysis” which, in general, I like. But the presentation has some shortcomings, too, which I note … Continue reading

Posted in Bayesian, Bayesian computational methods, Charlie Kufs, George Sugihara, sampling, sampling algorithms, statistics, yves tille | Leave a comment

On bag bans and sampling plans

Plastic bag bans are all the rage. It’s not the purpose of this post to take a position on the matter. Before you do, however, I’d recommend checking out this: and especially this: (Note: My lovely wife, Claire, presents this … Continue reading

Posted in bag bans, citizen data, citizen science, Commonwealth of Massachusetts, Ecology Action, evidence, Google, Google Earth, Google Maps, goverance, lifestyle changes, microplastics, municipal solid waste, oceans, open data, planning, plastics, politics, pollution, public health, quantitative ecology, R, R statistical programming language, reasonableness, recycling, rhetorical statistics, sampling, sampling networks, statistics, surveys, sustainability | 2 Comments

Sampling: Rejection, Reservoir, and Slice

An article by Suilou Huang for catatrophe modeler AIR-WorldWide of Boston about rejection sampling in CAT modeling got me thinking about pulling together some notes about sampling algorithms of various kinds. There are, of course, books written about this subject, … Continue reading

Posted in accept-reject methods, American Statistical Association, Bayesian computational methods, catastrophe modeling, data science, diffusion processes, empirical likelihood, Gibbs Sampling, insurance, Markov Chain Monte Carlo, mathematics, Mathematics and Climate Research Network, maths, Monte Carlo Statistical Methods, multivariate statistics, numerical algorithms, numerical analysis, numerical software, numerics, percolation theory, Python 3 programming language, R statistical programming language, Radford Neal, sampling, slice sampling, spatial statistics, statistics, stochastic algorithms, stochastic search | Leave a comment

Senn’s `… never having to say you are certain’ guest post from Mayo’s blog

via S. Senn: Being a statistician means never having to say you are certain (Guest Post) See also: E. Cai’s blog post “Applied Statistics Lesson of the Day – The Matched Pairs Experimental Design”, from February 2014 A. Deaton, N. … Continue reading

Posted in abstraction, American Association for the Advancement of Science, American Statistical Association, cancer research, data science, ecology, experimental design, generalized linear mixed models, generalized linear models, Mathematics and Climate Research Network, medicine, sampling, statistics, the right to know | Leave a comment

Eli on “Tom [Karl]’s trick and experimental design“

A very fine post at Eli’s blog for students of statistics, meteorology, and climate (like myself) titled: Tom’s trick and experimental design Excerpt: This and the graph from Menne at the top shows that Karl’s trick is working. Although we … Continue reading

Quote | Posted on by | Leave a comment

“Bigger Isn’t Always Better When It Comes to Data”: Barry Nussbaum

The President’s Corner in the May 2017 issue of Amstat News, the monthly newsletter of the American Statistical Association (“ASA”), features the interesting exposition by environmental statistician and President of the ASA, Barry Nussbaum, called “Bigger isn’t always better when … Continue reading

Posted in American Statistical Association, emissions, sampling, sampling without replacement, smoothing, spatial statistics, statistics | Leave a comment

David Spiegelhalter on `how to spot a dodgy statistic’

In this political season, it’s useful to brush up on rhetorical skills, particularly ones involving numbers and statistics, or what John Allen Paulos called numeracy. Professor David Spiegelhalter has written a guide to some of these tricks. Read the whole … Continue reading

Posted in abstraction, anemic data, Bayes, Bayesian, chance, citizenship, civilization, corruption, Daniel Kahneman, disingenuity, Donald Trump, education, games of chance, ignorance, maths, moral leadership, obfuscating data, open data, perceptions, politics, rationality, reason, reasonableness, rhetoric, risk, sampling, science, sociology, statistics, the right to know | Leave a comment

On Smart Data

One of the things I find surprising, if not astonishing, is that in the rush to embrace Big Data, a lot of learning and statistical technique has been left apparently discarded along the way. I’m hardly the first to point … Continue reading

Posted in Akaike Information Criterion, Bayes, Bayesian, Bayesian inversion, big data, bigmemory package for R, changepoint detection, data science, data streams, dlm package, dynamic generalized linear models, dynamic linear models, dynamical systems, Generalize Additive Models, generalized linear models, information theoretic statistics, Kalman filter, linear algebra, logistic regression, machine learning, Markov Chain Monte Carlo, mathematics, mathematics education, maths, maximum likelihood, MCMC, Monte Carlo Statistical Methods, multivariate statistics, numerical analysis, numerical software, numerics, quantitative biology, quantitative ecology, rationality, reasonableness, sampling, smart data, state-space models, statistical dependence, statistics, the right to know, time series | Leave a comment

“Catching long tail distribution” (Ted Dunning)

One of the best presentations on what can happen if someone takes a naive approach to network data. It also highlights what is, to my mind, the greatly underappreciated t-distribution, which is typically only used in connection with frequentist Student … Continue reading

Posted in Cauchy distribution, complex systems, data science, Lévy flights, leptokurtic, mathematics, maths, networks, physics, population biology, population dynamics, regime shifts, sampling, statistics, Student t distribution, time series | Leave a comment

Going down to the Southern Ocean, by Earle Wilson (on the Scripps R/V Roger Revelle)

(Click on picture to see a larger image, and use your browser Back button to return to reading.) Getting steady data from the Earth’s oceans demands commitment and not a little courage. I could never do what these oceanographers do, … Continue reading

Posted in Alison M Macdonald, anemic data, Antarctica, climate data, complex systems, Earle Wilson, Emily Shuckburgh, engineering, environment, fluid dynamics, geophysics, marine biology, NOAA, oceanic eddies, oceanography, open data, Principles of Planetary Climate, sampling, science, Scripps Institution of Oceanography, thermohaline circulation, waves, WHOI, Woods Hole Oceanographic Institution | Leave a comment

Ah, Hypergeometric!

(“Ah, Hypergeometric!” To be said with the same resignation and acceptance as in “I’ll burn my books–Ah, Mephistopheles!” from Faust.)😉 Dr John Cook, eminent all ’round statistician (with a specialty in biostatistics) and statistical consultant, took up a comment I … Continue reading

Posted in card decks, card draws, card games, games of chance, John Cook, mathematics, maths, probability, sampling, sampling without replacement, statistical dependence | Leave a comment