# Category Archives: data science

## The Johnson-Lindenstrauss Lemma, and the paradoxical power of random linear operators. Part 1.

I’ll be discussing the ramifications of: William B. Johnson and Joram Lindenstrauss, “Extensions of Lipschitz mappings into a Hilbert space, Contemporary Mathematics, 26:189–206, 1984. for several posts here. Some introduction and links to proofs and explications will be provided, as … Continue reading

## Sampling: Rejection, Reservoir, and Slice

An article by Suilou Huang for catatrophe modeler AIR-WorldWide of Boston about rejection sampling in CAT modeling got me thinking about pulling together some notes about sampling algorithms of various kinds. There are, of course, books written about this subject, … Continue reading

## Erin Gallagher’s “#QAnon network visualizations”

See her most excellent blog post, a delve into true Data Science. (Click on figure to see a full-size image. It is large. Use your browser Back Button to return to this blog afterwards.) Hat tip to Bob Calder and … Continue reading

## Senn’s `… never having to say you are certain’ guest post from Mayo’s blog

via S. Senn: Being a statistician means never having to say you are certain (Guest Post) See also: E. Cai’s blog post “Applied Statistics Lesson of the Day – The Matched Pairs Experimental Design”, from February 2014 A. Deaton, N. … Continue reading

## Eli on “Tom [Karl]’s trick and experimental design“

A very fine post at Eli’s blog for students of statistics, meteorology, and climate (like myself) titled: Tom’s trick and experimental design Excerpt: This and the graph from Menne at the top shows that Karl’s trick is working. Although we … Continue reading

## “Hadoop is NOT ‘Big Data’ is NOT Analytics”

Arun Krishnan, CEO & Founder at Analytical Sciences comments on this serious problem with the field. Short excerpt: … A person who is able to write code using Hadoop and the associated frameworks is not necessarily someone who can understand … Continue reading

## Is the answer to the democratization of Science doing more Citizen Science?

I have been following, with keen interest, the post and comment thread pertaining to “Democratising science” at the blog I monitor daily, … and Then There’s Physics. I think the core subject being discussed is a little different from my … Continue reading

## A new feature: Technical publications of the week

I’m beginning a new style of column, called technical publications of the week. While I can’t promise these will be weekly, I will, from time to time, highlight technical publications I’ve recently read which I consider to be noteworthy. I … Continue reading

## Why scientific measurements need to be adjusted

There is an excellent piece in Ars Technica about why scientific measurements need to be adjusted, and the implications of this for climate data. It is written by Scott K Johnson and is called “Thorough, not thoroughly fabricated: The truth … Continue reading

## Sleeping Giant Awakening

Originally posted on Climate Denial Crock of the Week:

https://twitter.com/johnmyers/status/809097380456865792 Wikipedia: Isoroku Yamamoto’s sleeping giant quotation is a quote by the Japanese Admiral Isoroku Yamamoto regarding the 1941 attack on Pearl Harbor by forces of Imperial Japan. The quotation is portrayed at the very end of…

## Cathy O’Neil’s WEAPONS OF MATH DESTRUCTION: A Review

(Revised and updated Monday, 24th October 2016.) Weapons of Math Destruction, Cathy O’Neil, published by Crown Random House, 2016. This is a thoughtful and very approachable introduction and review to the societal and personal consequences of data mining, data science, … Continue reading

## NextGen VOICES: `On data’, `On setbacks’, and `On discovery’

Science Magazine has a periodic column called Science in brief and occasionally that column features a set of what they call “NextGen VOICES”, meaning young scientists. They gather the survey using Twitter (of course) via the hashtag #NextGenSci. For the … Continue reading

## “Holy crap – an actual book!”

Originally posted on mathbabe:

Yo, everyone! The final version of my book now exists, and I have exactly one copy! Here’s my editor, Amanda Cook, holding it yesterday when we met for beers: Here’s my son holding it: He’s offered…

## data.table

R provides a helpful data structure called the “data frame” that gives the user an intuitive way to organize, view, and access data. Many of the functions that you would us… Source: Intro to The data.table Package

## On Smart Data

One of the things I find surprising, if not astonishing, is that in the rush to embrace Big Data, a lot of learning and statistical technique has been left apparently discarded along the way. I’m hardly the first to point … Continue reading

## “Catching long tail distribution” (Ted Dunning)

One of the best presentations on what can happen if someone takes a naive approach to network data. It also highlights what is, to my mind, the greatly underappreciated t-distribution, which is typically only used in connection with frequentist Student … Continue reading

## Climate Denial Fails Pepsi Challenge

Originally posted on Climate Denial Crock of the Week:

Stephen Lewandowsky specializes in conducting research that pulls back the curtain climate denial psychology. He’s done it again. Washington Post: Researchers have designed an inventive test suggesting that the arguments commonly used…

## Of my favorite things …

(Clarifying language added 4 Apr 2016, 12:26 EDT.) I just watched an episode from the last season of Star Trek: The Next Generation entitled “Force of Nature.” As anyone who pays the least attention to this blog knows, opposing human … Continue reading

## HadCRUT4 and GISTEMP series filtered and estimated with simple RTS model

Happy Vernal Equinox! This post has been updated today with some of the equations which correspond to the models. An assessment of whether or not there was a meaningful slowdown or “hiatus” in global warming, was recently discussed by Tamino … Continue reading

## p-values and hypothesis tests: the Bayesian(s) rule

The American Statistical Association of which I am a longtime member issued an important statement today which will hopefully move statistical practice in engineering and especially in the sciences away from the misleading practice of using p-values and hypothesis tests. … Continue reading

## K-Nearest Neighbors: dangerously simple

Originally posted on mathbabe:

I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do…