Category Archives: big data

There’s Big Data, Tiny Data, and now Dead Data

You’ve heard of Big Data. You may have heard of Tiny Data. But now, presented in the Harvard Data Science Review, Professor Steve Stigler presents Dead Data See: S. M. Stigler, “Data have a limited shelf life”, Harvard Data Science … Continue reading

Posted in big data, dead data, statistics, tiny data | Leave a comment

“Hadoop is NOT ‘Big Data’ is NOT Analytics”

Arun Krishnan, CEO & Founder at  Analytical Sciences comments on this serious problem with the field. Short excerpt: … A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand … Continue reading

Posted in alchemy, American Statistical Association, artificial intelligence, big data, data science, engineering, Internet, jibber jabber, machine learning, natural language processing, NLTK, sociology, superstition | Leave a comment

A new feature: Technical publications of the week

I’m beginning a new style of column, called technical publications of the week. While I can’t promise these will be weekly, I will, from time to time, highlight technical publications I’ve recently read which I consider to be noteworthy. I … Continue reading

Posted in Anthropocene, big data, climate change, climate disruption, data science, data streams, earthquakes, geophysics, global warming, Hyper Anthropocene, Locality Sensitive Hashing, LSH, MinHash, numerical algorithms, numerical analysis, random projections, seismology, subspace projection methods, SVD, the right to be and act stupid, the tragedy of our present civilization, the value of financial assets | 1 Comment

NextGen VOICES: `On data’, `On setbacks’, and `On discovery’

Science Magazine has a periodic column called Science in brief and occasionally that column features a set of what they call “NextGen VOICES”, meaning young scientists. They gather the survey using Twitter (of course) via the hashtag #NextGenSci. For the … Continue reading

Posted in American Association for the Advancement of Science, big data, data science, evidence, Mathbabe, maths, maxims, science, Science magazine, smart data, statistics | Leave a comment

data.table

R provides a helpful data structure called the “data frame” that gives the user an intuitive way to organize, view, and access data.  Many of the functions that you would us… Source: Intro to The data.table Package

Posted in big data, data science, engineering, numerical analysis, numerical software, numerics, open source scientific software, R, smart data, statistics | Leave a comment

On Smart Data

One of the things I find surprising, if not astonishing, is that in the rush to embrace Big Data, a lot of learning and statistical technique has been left apparently discarded along the way. I’m hardly the first to point … Continue reading

Posted in Akaike Information Criterion, Bayes, Bayesian, Bayesian inversion, big data, bigmemory package for R, changepoint detection, data science, data streams, dlm package, dynamic generalized linear models, dynamic linear models, dynamical systems, Generalize Additive Models, generalized linear models, information theoretic statistics, Kalman filter, linear algebra, logistic regression, machine learning, Markov Chain Monte Carlo, mathematics, mathematics education, maths, maximum likelihood, MCMC, Monte Carlo Statistical Methods, multivariate statistics, numerical analysis, numerical software, numerics, quantitative biology, quantitative ecology, rationality, reasonableness, sampling, smart data, state-space models, statistical dependence, statistics, the right to know, time series | Leave a comment

K-Nearest Neighbors: dangerously simple

Originally posted on mathbabe:
I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do…

Posted in big data, data science, evidence, machine learning | Leave a comment

R and “big data”

On 2nd November 2015, Wes McKinney, the developer of the highly useful Python pandas module (and other things, including books), wrote an amusing blog post, “The problem with the data science language wars“. I by no means disagree with him. … Continue reading

Posted in Bayes, Bayesian, big data, bigmemory package for R, Jay Emerson, MCMC, numerics, Python 3, R, Yale University Statistics Department | Leave a comment