Category Archives: sampling

Calculating Derivatives from Random Forests

Posted on 27 June 2020 by ecoquant

(Comment on prediction intervals for random forests, and links to a paper.) (Edits to repair smudges, 2020-06-28, about 0945 EDT. Closing comment, 2020-06-30, 1450 EDT.) There are lots of ways of learning about mathematical constructs, even about actual machines. One … Continue reading →

Posted in bridge to somewhere, Calculus, dependent data, dynamic generalized linear models, dynamical systems, ensemble methods, ensemble models, filtering, forecasting, hierarchical clustering, linear regression, model-free forecasting, Monte Carlo Statistical Methods, non-mechanistic modeling, non-parametric model, non-parametric statistics, numerical algorithms, prediction, R statistical programming language, random forests, regression, sampling, splines, statistical learning, statistical series, statistics, time derivatives, time series | Leave a comment

COVID-19 statistics, a caveat : Sources of data matter

Posted on 17 May 2020 by ecoquant

There are a number of sources of COVID-19-related demographics, cases, deaths, numbers testing positive, numbers recovered, and numbers testing negative available. Many of these are not consistent with one another. One could hope at least rates would be consistent, but … Continue reading →

Posted in coronavirus, count data regression, COVID-19, descriptive statistics, epidemiology, pandemic, policy metrics, politics, population biology, population dynamics, quantitative biology, quantitative ecology, sampling, SARS-CoV-2, statistical ecology, statistical series, statistics | 2 Comments

Reanalysis of business visits from deployments of a mobile phone app

Posted on 20 February 2020 by ecoquant

Updated, 20th October 2020 This reports a reanalysis of data from the deployment of a mobile phone app, as reported in: M. Yauck, L.-P. Rivest, G. Rothman, “Capture-recapture methods for data on the activation of applications on mobile phones“, Journal … Continue reading →

Posted in Bayesian computational methods, biology, capture-mark-recapture, capture-recapture, Christian Robert, count data regression, cumulants, diffusion, diffusion processes, Ecological Society of America, ecology, epidemiology, experimental science, field research, Gibbs Sampling, Internet measurement, Jean-Michel Marin, linear regression, mark-recapture, mathematics, maximum likelihood, Monte Carlo Statistical Methods, multilist methods, multivariate statistics, non-mechanistic modeling, non-parametric statistics, numerics, open source scientific software, Pierre-Simon Laplace, population biology, population dynamics, quantitative biology, quantitative ecology, R, R statistical programming language, sampling, sampling algorithms, segmented package in R, statistical ecology, statistical models, statistical regression, statistical series, statistics, stepwise approximation, stochastic algorithms, surveys, V. M. R. Muggeo | 1 Comment

“Ten Fatal Flaws in Data Analysis” (Charles Kufs)

Posted on 30 June 2019 by ecoquant

Professor Kufs has a fun book, Stats with Cats, and a blog. He also has a blog post tiled “Ten Fatal Flaws in Data Analysis” which, in general, I like. But the presentation has some shortcomings, too, which I note … Continue reading →

Posted in Bayesian, Bayesian computational methods, Charlie Kufs, George Sugihara, sampling, sampling algorithms, statistics, yves tille | Leave a comment

On bag bans and sampling plans

Posted on 18 February 2019 by ecoquant

Plastic bag bans are all the rage. It’s not the purpose of this post to take a position on the matter. Before you do, however, I’d recommend checking out this: and especially this: (Note: My lovely wife, Claire, presents this … Continue reading →

Posted in bag bans, citizen data, citizen science, Commonwealth of Massachusetts, Ecology Action, evidence, Google, Google Earth, Google Maps, goverance, lifestyle changes, microplastics, municipal solid waste, oceans, open data, planning, plastics, politics, pollution, public health, quantitative ecology, R, R statistical programming language, reasonableness, recycling, rhetorical statistics, sampling, sampling networks, statistics, surveys, sustainability | 2 Comments

Sampling: Rejection, Reservoir, and Slice

Posted on 29 September 2018 by ecoquant

An article by Suilou Huang for catatrophe modeler AIR-WorldWide of Boston about rejection sampling in CAT modeling got me thinking about pulling together some notes about sampling algorithms of various kinds. There are, of course, books written about this subject, … Continue reading →

Posted in accept-reject methods, American Statistical Association, Bayesian computational methods, catastrophe modeling, data science, diffusion processes, empirical likelihood, Gibbs Sampling, insurance, Markov Chain Monte Carlo, mathematics, Mathematics and Climate Research Network, maths, Monte Carlo Statistical Methods, multivariate statistics, numerical algorithms, numerical analysis, numerical software, numerics, percolation theory, Python 3 programming language, R statistical programming language, Radford Neal, sampling, slice sampling, spatial statistics, statistics, stochastic algorithms, stochastic search | Leave a comment

Senn’s `… never having to say you are certain’ guest post from Mayo’s blog

Posted on 22 January 2018 by ecoquant

via S. Senn: Being a statistician means never having to say you are certain (Guest Post) See also: E. Cai’s blog post “Applied Statistics Lesson of the Day – The Matched Pairs Experimental Design”, from February 2014 A. Deaton, N. … Continue reading →

Posted in abstraction, American Association for the Advancement of Science, American Statistical Association, cancer research, data science, ecology, experimental design, generalized linear mixed models, generalized linear models, Mathematics and Climate Research Network, medicine, sampling, statistics, the right to know | Leave a comment

Eli on “Tom [Karl]’s trick and experimental design“

Posted on 11 November 2017 by ecoquant

A very fine post at Eli’s blog for students of statistics, meteorology, and climate (like myself) titled: Tom’s trick and experimental design Excerpt: This and the graph from Menne at the top shows that Karl’s trick is working. Although we … Continue reading →

Posted in American Meteorological Association, American Statistical Association, AMETSOC, anomaly detection, climate, climate change, climate data, data science, evidence, experimental design, generalized linear mixed models, GISTEMP, GLMMs, global warming, model comparison, model-free forecasting, reblog, sampling, sampling networks | Leave a comment

“Bigger Isn’t Always Better When It Comes to Data”: Barry Nussbaum

Posted on 22 May 2017 by ecoquant

The President’s Corner in the May 2017 issue of Amstat News, the monthly newsletter of the American Statistical Association (“ASA”), features the interesting exposition by environmental statistician and President of the ASA, Barry Nussbaum, called “Bigger isn’t always better when … Continue reading →

Posted in American Statistical Association, emissions, sampling, sampling without replacement, smoothing, spatial statistics, statistics | Leave a comment

David Spiegelhalter on `how to spot a dodgy statistic’

Posted on 21 July 2016 by ecoquant

In this political season, it’s useful to brush up on rhetorical skills, particularly ones involving numbers and statistics, or what John Allen Paulos called numeracy. Professor David Spiegelhalter has written a guide to some of these tricks. Read the whole … Continue reading →

Posted in abstraction, anemic data, Bayes, Bayesian, chance, citizenship, civilization, corruption, Daniel Kahneman, disingenuity, Donald Trump, education, games of chance, ignorance, maths, moral leadership, obfuscating data, open data, perceptions, politics, rationality, reason, reasonableness, rhetoric, risk, sampling, science, sociology, statistics, the right to know | Leave a comment

On Smart Data

Posted on 11 June 2016 by ecoquant

One of the things I find surprising, if not astonishing, is that in the rush to embrace Big Data, a lot of learning and statistical technique has been left apparently discarded along the way. I’m hardly the first to point … Continue reading →

Posted in Akaike Information Criterion, Bayes, Bayesian, Bayesian inversion, big data, bigmemory package for R, changepoint detection, data science, data streams, dlm package, dynamic generalized linear models, dynamic linear models, dynamical systems, Generalize Additive Models, generalized linear models, information theoretic statistics, Kalman filter, linear algebra, logistic regression, machine learning, Markov Chain Monte Carlo, mathematics, mathematics education, maths, maximum likelihood, MCMC, Monte Carlo Statistical Methods, multivariate statistics, numerical analysis, numerical software, numerics, quantitative biology, quantitative ecology, rationality, reasonableness, sampling, smart data, state-space models, statistical dependence, statistics, the right to know, time series | Leave a comment

“Catching long tail distribution” (Ted Dunning)

Posted on 10 June 2016 by ecoquant

One of the best presentations on what can happen if someone takes a naive approach to network data. It also highlights what is, to my mind, the greatly underappreciated t-distribution, which is typically only used in connection with frequentist Student … Continue reading →

Posted in Cauchy distribution, complex systems, data science, Lévy flights, leptokurtic, mathematics, maths, networks, physics, population biology, population dynamics, regime shifts, sampling, statistics, Student t distribution, time series | Leave a comment

Going down to the Southern Ocean, by Earle Wilson (on the Scripps R/V Roger Revelle)

Posted on 5 March 2016 by ecoquant

(Click on picture to see a larger image, and use your browser Back button to return to reading.) Getting steady data from the Earth’s oceans demands commitment and not a little courage. I could never do what these oceanographers do, … Continue reading →

Posted in Alison M Macdonald, anemic data, Antarctica, climate data, complex systems, Earle Wilson, Emily Shuckburgh, engineering, environment, fluid dynamics, geophysics, marine biology, NOAA, oceanic eddies, oceanography, open data, Principles of Planetary Climate, sampling, science, Scripps Institution of Oceanography, thermohaline circulation, waves, WHOI, Woods Hole Oceanographic Institution | Leave a comment

Ah, Hypergeometric!

Posted on 8 February 2016 by ecoquant

(“Ah, Hypergeometric!” To be said with the same resignation and acceptance as in “I’ll burn my books–Ah, Mephistopheles!” from Faust.)😉 Dr John Cook, eminent all ’round statistician (with a specialty in biostatistics) and statistical consultant, took up a comment I … Continue reading →