K-Nearest Neighbors: dangerously simple

Posted on 30 January 2016 by ecoquant

Yeah, Mathbabe’s got it right: People who use kNN often don’t think about these things.

For those who aren’t familiar with this technique, here’s a description from Zhi-Hua Zhou in Ensemble Methods: Foundations and Algorithms (section 1.2.5):

“The $k$ -nearest neighbor ( $k$ NN) algorithm relies on the principle that objects similar in the input space are also similar in the output space. It is a lazy learning approach since it does not have an explicit training process, but simply stores the training set instead. For a test instance, a $k$ -near neighbor learner identifieds the $k$ insteances from the training set that are closest to the test instance. Then, for classification, the test instance will be classified to the majority class among the $k$ instances; while for regression, the test instance will be assigned the average value of the $k$ instances.”

About ecoquant

See https://wordpress.com/view/667-per-cm.net/ Retired data scientist and statistician. Now working projects in quantitative ecology and, specifically, phenology of Bryophyta and technical methods for their study, notably Macrophotography. Some photos of mine: https://www.flickr.com/photos/198372469@N03/

View all posts by ecoquant →

This entry was posted in big data, data science, evidence, machine learning. Bookmark the permalink.