In this module, you use the built-in Amazon SageMaker k-Nearest Neighbors (k-NN) Algorithm to train the content recommendation model.
Amazon SageMaker K-Nearest Neighbors (k-NN) is a non-parametric, index-based, supervised learning algorithm that can be used for classification and regression tasks. For classification, the algorithm queries the k closest points to the target and returns the most frequently used label of their class as the predicted label. For regression problems, the algorithm returns the average of the predicted values returned by the k closest neighbors.
Training with the k-NN algorithm has three steps: sampling, dimension reduction, and index building. Sampling reduces the size of the initial dataset so that it fits into memory. For dimension reduction, the algorithm decreases the feature dimension of the data to reduce the footprint of the k-NN model in memory and inference latency. We provide two methods of dimension reduction methods: random projection and the fast Johnson-Lindenstrauss transform. Typically, you use dimension reduction for high-dimensional (d >1000) datasets to avoid the “curse of dimensionality” that troubles the statistical analysis of data that becomes sparse as dimensionality increases. The main objective of k-NN's training is to construct the index. The index enables efficient lookups of distances between points whose values or class labels have not yet been determined and the k nearest points to use for inference.
In the following steps, you specify your k-NN algorithm for the training job, set the hyperparameter values to tune the model, and run the model. Then, you deploy the model to an endpoint managed by Amazon SageMaker to make predictions.
Time to Complete Module: 20 Minutes
Congratulations! In this module, you trained, deployed, and explored your content recommendation model.
In the next module, you clean up the resources you used in this lab.