K-Nearest Neighbors: dangerously simple

Yeah, Mathbabe’s got it right: People who use kNN often don’t think about these things.

For those who aren’t familiar with this technique, here’s a description from Zhi-Hua Zhou in Ensemble Methods: Foundations and Algorithms (section 1.2.5):

“The k-nearest neighbor (kNN) algorithm relies on the principle that objects similar in the input space are also similar in the output space. It is a lazy learning approach since it does not have an explicit training process, but simply stores the training set instead. For a test instance, a k-near neighbor learner identifieds the k insteances from the training set that are closest to the test instance. Then, for classification, the test instance will be classified to the majority class among the k instances; while for regression, the test instance will be assigned the average value of the k instances.”


I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do with it, nor who to ask, part of me wants to design (yet another) dumbed-down “analytics platform” so that business people can import their data onto the platform, and then perform simple algorithms themselves, without even having a data scientist to supervise.

After all, a good data scientist is hard to find. Sometimes you don’t even know if you want to invest in this whole big data thing, you’re not sure the data you’re collecting is all that great or whether the whole thing is just a bunch of hype. It’s tempting to bypass professional data scientists altogether and try to replace them with software.

I’m here to say, it’s not clear that’s…

View original post 651 more words

About ecoquant

See https://wordpress.com/view/667-per-cm.net/ Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study.
This entry was posted in big data, data science, evidence, machine learning. Bookmark the permalink.

Leave a reply. Commenting standards are described in the About section linked from banner.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.