Image you are a machine learning developer working at a bank. You have been asked to develop a machine learning model to help analysts in your company with the amount of news that they need to read in order to make a investment decisions. The model will be trained on the 20newsgroups dataset that contains information on 20 topics in approximately 20,000 documents.

As part of your model, you need to extract semantic information from the news data, then identify similar news articles from the corpus and provide content recommendations to the analysts for similar news items based on the ones they are reading.

In this lab, you learn how to create an Amazon SageMaker Notebook instance, download, prepare and stage a dataset using a Jupyter notebook, train and deploy your topic model, and finally train and deploy the content recommendation model.

In Module 1, you configure your environment that you use during the lab.

Time to Complete Module: 20 Minutes


  • Step 1: Create an AWS account

    Use a personal AWS account or create a new AWS account for this lab. Do not use an organizational account so that you have full access to the necessary services and do not leave behind any resources from the lab. If you do not delete the resources used in this lab when you are finished, you may incur AWS charges.

  • Step 2: Create an Amazon S3 bucket

    Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.

    Training a model produces model training data and model artifacts. In this lab, you use an Amazon S3 bucket to stage the training and validation datasets, and store the model artifacts generated by Amazon SageMaker during model training.

    To create an Amazon S3 bucket:

    1. Sign in to the AWS Management Console and open the Amazon S3 console.
    2. Choose Create bucket.
    3. For Bucket name, type sagemaker-xx where xx are your initials to make the bucket name unique.
    4. In Region, choose the AWS Region where you want the bucket to reside.
    5. In Bucket settings for Block Public Access, leave the settings enabled.
    6. Choose Create bucket.
  • Step 3: Create an Amazon SageMaker Notebook instance

    An Amazon SageMaker notebook instance is a fully managed machine learning (ML) Amazon Elastic Compute Cloud (Amazon EC2) compute instance that runs the Jupyter Notebook App.

    In this lab, you use the notebook instance to create and manage your Jupyter notebook that you can use to prepare and process data and to train and deploy your content recommendation machine learning model.   

    To create an Amazon SageMaker Notebook instance:

    1. Open the Amazon SageMaker console.
    2. Choose Notebook instances, then choose Create notebook instance.
    3. On the Create notebook instance page, for Notebook instance name, type a name for your notebook instance.
    4. For Instance type, choose ml.t2.medium. This is the least expensive instance type that notebook instances support, and it suffices for this exercise.
    5. For IAM role, choose Create a new role, then choose Create role.
    6. Choose Create notebook instance.

    In a few minutes, Amazon SageMaker launches an ML compute instance—in this case, a notebook instance—and attaches an ML storage volume to it. The notebook instance has a preconfigured Jupyter notebook server and a set of Anaconda libraries.

  • Step 4: Create a Jupyter notebook

    You create a Jupyter notebook in your Amazon SageMaker Notebook instance. You also create a cell that gets the IAM role that your notebook needs to run Amazon SageMaker APIs and specifies the name of the Amazon S3 bucket that you will use to store the datasets that you use for your training data and the model artifacts that a Amazon SageMaker training job outputs.

    To create a Jupyter notebook:

    1. Open the Amazon SageMaker console.
    2. Choose Notebook Instances, and then open the notebook instance you created by choosing either Open Jupyter for classic Juypter view or Open JupyterLab for JupyterLab view.
      Note: If you see Pending to the right of the notebook instance in the Status column, your notebook is still being created. The status will change to InService when the notebook is ready for use.
    3. Create the notebook.
      • If you opened the notebook in Jupyter, on the Files tab, choose New, and conda_python3. This preinstalled environment includes the default Anaconda installation and Python
      • If you opened the notebook in JupyterLab, on the File menu, choose New, and then choose Notebook. For Select Kernel, choose conda_python3. This preinstalled environment includes the default Anaconda installation and Python 3.
    4. In the Jupyter notebook, choose File and Save as, and name the notebook.

In this module, you learned about the example ML model you train in this lab. You also set up an AWS account and your lab environment with an Amazon S3 bucket, Amazon SageMaker Notebook instance, and a Jupyter notebook.

You are now ready to start the lab. In the next module, you download, prepare, and stage your dataset.