K-NEAREST
NEIGHBOR ALGORITHM
K-Nearest neighbor algorithm is a non-parametric classification algorithm.
Its also known as Lazy algorithm.
KNN use a database in
which the data points are separated into several classes to predict the
classification of a new sample point.
The technique is non-parametric, it means that it does not make any assumptions
on the underlying data distribution. In other words, the
model structure is determined by the data.
KNN could and probably should be
one of the first choices for a classification study when there is little or no
prior knowledge about the distribution data. We can say that KNN Algorithm is
based on feature similarity
The
the algorithm uses the neighbor points information to predict the target class.
K-nearest neighbor classification step by step procedure
As
show an image first of all we choose number k of neighbors. Then take the K
nearest neighbors of the data point according to the Euclidean distance.
After
these we count K neighbors numbers of the data point in each category. Then assign
the new data point to the category where you counted the most neighbors.
How to choose the value of K?
Selecting the value of K in K-nearest neighbor is
the most critical problem. A small value of K means that noise will have a
higher influence on the result i.e., the probability of overfitting is
very high. A large value of K makes it computationally expensive and
defeats the basic idea behind KNN (that points that are near might have
similar classes ). A simple approach to select k is k = n^(1/2).
Advantages of
K-nearest neighbors algorithm
- Knn is
simplest algorithm to implement.
- Knn
executes quickly for small training data sets.
- Don’t
need any prior knowledge about the structure of data in the training set.
- No
retraining is required if the new training pattern is added to the
existing training set.
The limitation to K-nearest neighbors algorithm
- When
the training set is large, it may take a lot of space.
- Computationally expensive — because
the algorithm stores all of the training data
- High memory requirement
- Stores all (or almost all) of the training data
- Prediction stage might be slow
0 Comments