Category Archives: data science

Just because the data lies sometimes doesn’t mean it’s okay to censor it

Or, there’s no such thing as an outlier … Eli put up a post titled “The Data Lies. The Crisis in Observational Science and the Virtue of Strong Theory” at his lagomorph blog. Think of it: Data lying. Obviously this … Continue reading

Posted in Akaike Information Criterion, American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, Anthropocene, Bayes, Bayesian, climate, climate change, climate models, data science, dynamical systems, ecology, Eli Rabett, environment, Ethan Deyle, George Sughihara, Hao Ye, Hyper Anthropocene, information theoretic statistics, IPCC, Kalman filter, kriging, Lenny Smith, maximum likelihood, model comparison, model-free forecasting, physics, quantitative ecology, random walk processes, random walks, science, smart data, state-space models, statistics, Takens embedding theorem, the right to know, Timothy Lenton, Victor Brovkin | 1 Comment

“Hadoop is NOT ‘Big Data’ is NOT Analytics”

Arun Krishnan, CEO & Founder at  Analytical Sciences comments on this serious problem with the field. Short excerpt: … A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand … Continue reading

Posted in alchemy, American Statistical Association, artificial intelligence, big data, data science, engineering, Internet, jibber jabber, machine learning, natural language processing, NLTK, sociology, superstition | Leave a comment

Is the answer to the democratization of Science doing more Citizen Science?

I have been following, with keen interest, the post and comment thread pertaining to “Democratising science” at the blog I monitor daily, … and Then There’s Physics. I think the core subject being discussed is a little different from my … Continue reading

Posted in American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, astronomy, astrophysics, biology, citizen data, citizen science, citizenship, data science, ecology, education, environment, evidence, life purpose, local self reliance, marine biology, mathematics, mathematics education, maths, moral leadership, new forms of scientific peer review, open source scientific software, science, science education, statistics, the green century, the right to know | Leave a comment

A new feature: Technical publications of the week

I’m beginning a new style of column, called technical publications of the week. While I can’t promise these will be weekly, I will, from time to time, highlight technical publications I’ve recently read which I consider to be noteworthy. I … Continue reading

Posted in Anthropocene, big data, climate change, climate disruption, data science, data streams, earthquakes, geophysics, global warming, Hyper Anthropocene, Locality Sensitive Hashing, LSH, MinHash, numerical algorithms, numerical analysis, random projections, seismology, subspace projection methods, SVD, the right to be and act stupid, the tragedy of our present civilization, the value of financial assets | 1 Comment

Why scientific measurements need to be adjusted

There is an excellent piece in Ars Technica about why scientific measurements need to be adjusted, and the implications of this for climate data. It is written by Scott K Johnson and is called “Thorough, not thoroughly fabricated: The truth … Continue reading

Posted in American Association for the Advancement of Science, American Meteorological Association, American Statistical Association, AMETSOC, Berkeley Earth Surface Temperature project, Canettes Blues Band, citizen data, climate data, data science, environment, evidence, geophysics, GISTEMP, HadCRUT4, mathematics education, meteorological models, obfuscating data, open data, physics, science, spatial statistics, Tamino, the right to know, the tragedy of our present civilization, Variable Variability | Leave a comment

Sleeping Giant Awakening

Originally posted on Climate Denial Crock of the Week:
https://twitter.com/johnmyers/status/809097380456865792 Wikipedia: Isoroku Yamamoto’s sleeping giant quotation is a quote by the Japanese Admiral Isoroku Yamamoto regarding the 1941 attack on Pearl Harbor by forces of Imperial Japan. The quotation is portrayed at the very end of…

Posted in adaptation, American Association for the Advancement of Science, American Meteorological Association, American Solar Energy Society, American Statistical Association, AMETSOC, Anthropocene, California, Carbon Worshipers, citizen data, citizen science, climate, climate change, climate data, climate disruption, data science, Donald Trump, ecology, Ecology Action, geophysics, global warming, Hyper Anthropocene, ignorance, Jerry Brown, science, sustainability, the right to be and act stupid, the right to know, the stack of lies, the tragedy of our present civilization | Leave a comment

Cathy O’Neil’s WEAPONS OF MATH DESTRUCTION: A Review

(Revised and updated Monday, 24th October 2016.) Weapons of Math Destruction, Cathy O’Neil, published by Crown Random House, 2016. This is a thoughtful and very approachable introduction and review to the societal and personal consequences of data mining, data science, … Continue reading

Posted in citizen data, citizen science, citizenship, civilization, compassion, complex systems, criminal justice, Daniel Kahneman, data science, deep recurrent neural networks, destructive economic development, economics, education, engineering, ethics, Google, ignorance, Joseph Schumpeter, life purpose, machine learning, Mathbabe, mathematics, mathematics education, maths, model comparison, model-free forecasting, numerical analysis, numerical software, open data, optimization, organizational failures, planning, politics, prediction, prediction markets, privacy, rationality, reason, reasonableness, risk, silly tech devices, smart data, sociology, Techno Utopias, testing, the value of financial assets, transparency | Leave a comment

NextGen VOICES: `On data’, `On setbacks’, and `On discovery’

Science Magazine has a periodic column called Science in brief and occasionally that column features a set of what they call “NextGen VOICES”, meaning young scientists. They gather the survey using Twitter (of course) via the hashtag #NextGenSci. For the … Continue reading

Posted in American Association for the Advancement of Science, big data, data science, evidence, Mathbabe, maths, maxims, science, Science magazine, smart data, statistics | Leave a comment

“Holy crap – an actual book!”

Originally posted on mathbabe:
Yo, everyone! The final version of my book now exists, and I have exactly one copy! Here’s my editor, Amanda Cook, holding it yesterday when we met for beers: Here’s my son holding it: He’s offered…

Posted in American Association for the Advancement of Science, Buckminster Fuller, business, citizen science, citizenship, civilization, complex systems, confirmation bias, data science, data streams, deep recurrent neural networks, denial, economics, education, engineering, ethics, evidence, Internet, investing, life purpose, machine learning, mathematical publishing, mathematics, mathematics education, maths, moral leadership, multivariate statistics, numerical software, numerics, obfuscating data, organizational failures, politics, population biology, prediction, prediction markets, privacy, quantitative biology, quantitative ecology, rationality, reason, reasonableness, rhetoric, risk, Schnabel census, smart data, sociology, statistical dependence, statistics, the right to be and act stupid, the right to know, the value of financial assets, transparency, UU Humanists | Leave a comment

“Stochastic Parameterization: Towards a new view of weather and climate models”

Judith Berner, Ulrich Achatz, Lauriane Batté, Lisa Bengtsson, Alvaro De La Cámara, Hannah M. Christensen, Matteo Colangeli, Danielle R. B. Coleman, Daan Crommelin, Stamen I. Dolaptchiev, Christian L.E. Franzke, Petra Friederichs, Peter Imkeller, Heikki Järvinen, Stephan Juricke, Vassili Kitsios, François … Continue reading

Posted in biology, climate models, complex systems, convergent cross-mapping, data science, dynamical systems, ecology, Ethan Deyle, Floris Takens, George Sughihara, Hao Ye, likelihood-free, Lorenz, mathematics, meteorological models, model-free forecasting, physics, population biology, population dynamics, quantitative biology, quantitative ecology, Scripps Institution of Oceanography, state-space models, statistical dependence, statistics, stochastic algorithms, stochastic search, stochastics, Takens embedding theorem, time series, Victor Brovkin | 4 Comments

data.table

R provides a helpful data structure called the “data frame” that gives the user an intuitive way to organize, view, and access data.  Many of the functions that you would us… Source: Intro to The data.table Package

Posted in big data, data science, engineering, numerical analysis, numerical software, numerics, open source scientific software, R, smart data, statistics | Leave a comment

On Smart Data

One of the things I find surprising, if not astonishing, is that in the rush to embrace Big Data, a lot of learning and statistical technique has been left apparently discarded along the way. I’m hardly the first to point … Continue reading

Posted in Akaike Information Criterion, Bayes, Bayesian, Bayesian inversion, big data, bigmemory package for R, changepoint detection, data science, data streams, dlm package, dynamic generalized linear models, dynamic linear models, dynamical systems, Generalize Additive Models, generalized linear models, information theoretic statistics, Kalman filter, linear algebra, logistic regression, machine learning, Markov Chain Monte Carlo, mathematics, mathematics education, maths, maximum likelihood, MCMC, Monte Carlo Statistical Methods, multivariate statistics, numerical analysis, numerical software, numerics, quantitative biology, quantitative ecology, rationality, reasonableness, sampling, smart data, state-space models, statistical dependence, statistics, the right to know, time series | Leave a comment

“Catching long tail distribution” (Ted Dunning)

One of the best presentations on what can happen if someone takes a naive approach to network data. It also highlights what is, to my mind, the greatly underappreciated t-distribution, which is typically only used in connection with frequentist Student … Continue reading

Posted in Cauchy distribution, complex systems, data science, Lévy flights, leptokurtic, mathematics, maths, networks, physics, population biology, population dynamics, regime shifts, sampling, statistics, Student t distribution, time series | Leave a comment

Climate Denial Fails Pepsi Challenge

Originally posted on Climate Denial Crock of the Week:
Stephen Lewandowsky specializes in conducting research that pulls back the curtain climate denial psychology. He’s done it again. Washington Post: Researchers have designed an inventive test suggesting that the arguments commonly used…

Posted in American Association for the Advancement of Science, American Statistical Association, card draws, card games, chance, climate, climate change, climate data, climate education, confirmation bias, data science, denial, disingenuity, education, false advertising, fear uncertainty and doubt, fossil fuels, games of chance, geophysics, global warming, ignorance, mathematics, mathematics education, maths, obfuscating data, rationality, reasonableness, risk, science, science education, sociology, the right to know | Leave a comment

A Sankey diagram showing influence of big oil on climate policy

I’ve written about Sankey diagrams before. Here’s a novel use: InfluenceMap has used a Sankey diagram to demonstrate “How much big oil spends on obstructive climate lobbying”. The figure that’s available for media is shown below. (Click on image to … Continue reading

Posted in American Petroleum Institute, Anthropocene, Bloomberg, Bloomberg New Energy Finance, BNEF, carbon dioxide, Carbon Worshipers, Chevron, citizenship, climate, climate change, climate disruption, climate education, climate justice, corporate litigation on damage from fossil fuel emissions, data science, destructive economic development, disingenuity, economics, education, energy, Exxon, false advertising, fear uncertainty and doubt, fossil fuels, global warming, greenhouse gases, Gulf Oil, Hyper Anthropocene, ignorance, lobbying, methane, natural gas, pipelines, politics, rationality, reasonableness, risk, Sankey diagram, Standard Oil of California, Texaco, the value of financial assets | Leave a comment

Of my favorite things …

(Clarifying language added 4 Apr 2016, 12:26 EDT.) I just watched an episode from the last season of Star Trek: The Next Generation entitled “Force of Nature.” As anyone who pays the least attention to this blog knows, opposing human … Continue reading

Posted in Anthropocene, bridge to somewhere, bucket list, Buckminster Fuller, Carl Sagan, climate, climate change, climate disruption, climate education, compassion, data science, Earle Wilson, ecology, Ecology Action, environment, evolution, geophysics, George Sughihara, global warming, Hyper Anthropocene, life purpose, mathematics, mathematics education, maths, numerical analysis, optimization, philosophy, physical materialism, physics, population biology, population dynamics, proud dad, quantitative biology, quantitative ecology, rationality, reasonableness, science, sociology, statistics, stochastic algorithms | 5 Comments

HadCRUT4 and GISTEMP series filtered and estimated with simple RTS model

Happy Vernal Equinox! This post has been updated today with some of the equations which correspond to the models. An assessment of whether or not there was a meaningful slowdown or “hiatus” in global warming, was recently discussed by Tamino … Continue reading

Posted in AMETSOC, anemic data, Bayesian, boosting, bridge to somewhere, cat1, changepoint detection, climate, climate change, climate data, climate disruption, climate models, complex systems, computation, data science, dynamical systems, geophysics, George Sughihara, global warming, hiatus, information theoretic statistics, machine learning, maths, meteorology, MIchael Mann, multivariate statistics, physics, prediction, Principles of Planetary Climate, rationality, reasonableness, regime shifts, sea level rise, time series | 2 Comments

p-values and hypothesis tests: the Bayesian(s) rule

The American Statistical Association of which I am a longtime member issued an important statement today which will hopefully move statistical practice in engineering and especially in the sciences away from the misleading practice of using p-values and hypothesis tests. … Continue reading

Posted in approximate Bayesian computation, arXiv, Bayes, Bayesian, Bayesian inversion, bollocks, Christian Robert, climate, complex systems, data science, Frequentist, information theoretic statistics, likelihood-free, Markov Chain Monte Carlo, MCMC, Monte Carlo Statistical Methods, population biology, rationality, reasonableness, science, scientific publishing, statistical dependence, statistics, stochastics, Student t distribution | Leave a comment

K-Nearest Neighbors: dangerously simple

Originally posted on mathbabe:
I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do…

Posted in big data, data science, evidence, machine learning | Leave a comment