In this module, you use the built-in Amazon SageMaker Neural Topic Model (NTM) Algorithm to train the topic model.
Amazon SageMaker NTM is an unsupervised learning algorithm that is used to organize a corpus of documents into topics that contain word groupings based on their statistical distribution. Documents that contain frequent occurrences of words such as "bike", "car", "train", "mileage", and "speed" are likely to share a topic on "transportation" for example. Topic modeling can be used to classify or summarize documents based on the topics detected or to retrieve information or recommend content based on topic similarities. The topics from documents that NTM learns are characterized as a latent representation because the topics are inferred from the observed word distributions in the corpus. The semantics of topics are usually inferred by examining the top ranking words they contain. Because the method is unsupervised, only the number of topics, not the topics themselves, are prespecified. In addition, the topics are not guaranteed to align with how a human might naturally categorize documents.
In the following steps, you specify your NTM algorithm for the training job, specify infrastructure for the model, set the hyperparameter values to tune the model, and run the model. Then, you deploy the model to an endpoint managed by Amazon SageMaker to make predictions.
Time to Complete Module: 20 Minutes
In this module, you retrieved the Amazon SageMaker Neural Topic Model (NTM) Algorithm from Amazon ECR. Then, you specified algorithm-specific hyperparameters and provide the Amazon S3 bucket for artifact storage. Next, you deployed the model to an endpoint using Amazon SageMaker hosting services or batch transform. Finally, you explored the model using different values for the topic number.
In the next module, you train and deploy your content recommendation model.