Get Started with Your Machine Learning Project Quickly Using Amazon SageMaker JumpStart

TUTORIAL

Overview

In this tutorial, you will learn how to fast-track your machine learning (ML) project using pretrained models and prebuilt solutions offered by Amazon SageMaker JumpStart. You will then deploy the selected model through Amazon SageMaker Studio notebooks.

Amazon SageMaker JumpStart helps you quickly and easily get started with ML by providing access to hundreds of built-in algorithms with pretrained models from popular model hubs through the user interface. Using the SageMaker Python SDK, you can select a prebuilt model from the collection of models (also known as a "model zoo") to train on custom data or deploy to a SageMaker endpoint for inference. To make it easier to get started, SageMaker JumpStart provides a set of solutions for the most common use cases that can be deployed readily with just a few clicks. The solutions are fully customizable and showcase the use of AWS CloudFormation templates and reference architectures so you can accelerate your ML journey.

What you will accomplish

In this guide, you will:

Deploy a SageMaker JumpStart pretrained model
Run inferences using the endpoint deployed from SageMaker JumpStart

Prerequisites

Before starting this tutorial, you will need:

An AWS account: If you don't already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.

Implementation

For this tutorial, you will deploy a model called BERT Base Cased that has been pretrained on English text using Wikipedia and that performs well on text classification use cases.

AWS experience

Beginner

Minimum time to complete

15 minutes

Cost to complete

See Amazon SageMaker pricing to estimate cost for this tutorial.

Requires

You must be logged into an AWS account.

Services used

Amazon SageMaker JumpStart

Last updated

March 20, 2023

Step 1: Set up Amazon SageMaker Studio domain

An AWS account can have mulitiple SageMaker Studio domains per AWS Region. If you already have a SageMaker Studio domain in the US East (N. Virginia) Region, follow the SageMaker Studio setup guide to attach the required AWS IAM policies to your SageMaker Studio account, then skip Step 1, and proceed directly to Step 2.

If you don't have an existing SageMaker Studio domain, continue with Step 1 to run an AWS CloudFormation template that creates a SageMaker Studio domain and adds the permissions required for the rest of this tutorial.

Choose the AWS CloudFormation stack link. This link opens the AWS CloudFormation console and creates your SageMaker Studio domain and a user named studio-user. It also adds the required permissions to your SageMaker Studio account. In the CloudFormation console, confirm that US East (N. Virginia) is the Region displayed in the upper right corner. Stack name should be CFN-SM-IM-Lambda-catalog, and should not be changed. This stack takes about 10 minutes to create all the resources.

This stack assumes that you already have a public VPC set up in your account. If you do not have a public VPC, see VPC with a single public subnet to learn how to create a public VPC.

Step 1: Set up Amazon SageMaker Studio domain

Select I acknowledge that AWS CloudFormation might create IAM resources, and then choose Create stack.

On the CloudFormation pane, choose Stacks. When the stack is created, the status of the stack should change from CREATE_IN_PROGRESS to CREATE_COMPLETE.

Enter SageMaker Studio into the CloudFormation console search bar, and choose SageMaker Studio.

Choose US East (N. Virginia) from the Region dropdown list on the upper right corner of the SageMaker console.

In the Launch app drop down, choose Studio. The SageMaker Studio will open using the studio-user profile.

Step 2: Create a new launcher window and start JumpStart

Getting started with machine learning can be challenging, from knowing which models suit which use case to knowing where to start. Amazon SageMaker JumpStart solves this problem by providing a set of solutions for the most common use cases that can be deployed readily with just a few clicks. Pretrained models and solutions are minutes away from production-capable deployment endpoints.

To get started, you need to open a new launcher window by clicking the + icon at the top of the file window view.

(THIS CONTENT IS HIDDEN)

Step 2: Create a new Launcher window and start JumpStart

In the top left of the SageMaker Studio Launcher screen, choose the JumpStart models, algorithms, and solutions button.

This will start SageMaker JumpStart and you will see a new window with a wide variety of featured content, solutions, models, problem types, and more. For this tutorial, you will be running the BERT Base Cased Text pretrained model.

To find the pretrained model, use the search bar in the upper right and enter BERT to find the BERT models. Choose the model titled BERT Base Cased Text - Text Classification. Alternatively, you can browse the available models to find the correct one.

The BERT model includes the option to either deploy the pretrained model as is or to retrain the model. For this tutorial, you will deploy the pretrained model as is.

Choose the dropdown next to Deployment Configuration, and choose the dropdown SageMaker hosting instance.

You will see a number of instance types, which correspond to the resources that will be used to host the endpoint. Select ml.m5.large.

The second box corresponds to the endpoint name. Keep the default value. You can rename the endpoint at a later date, if needed.

Choose the dropdown labeled Security Settings. In this section, you can set up execution roles, VPC connection, and encryption.

Alternatively, you also have the option to create a training job with your own dataset. You can also use the model programmatically in a Jupyter notebook with the SageMaker API.

For this tutorial, keep the default settings.

Note: If you decide at a later date to use this in a production environment, you will want to update the Security settings.

Choose the Deploy button to begin setting up the model endpoint.

Next, you will see a dialog box that shows the model deployment status. This part of the process can take 5–10 minutes. The dialog box will change to show metadata around the model type, task, endpoint identifier, endpoint name, instance type, number of instances, and model data location as the process progresses. Once the endpoint deployment completes, the service status should update to In Service.

Step 3: Use the demo notebook provided to query the new JumpStart endpoint

Now that you have a model endpoint deployed, you can run inferences against it to retrieve predictions. In this part of the tutorial, you will run a short notebook to query the endpoint created in the prior step.

In this step, you will use the provided demo notebook to test the endpoint.

Choose Open Notebook.

The opened notebook contains Python code to run two text examples through the endpoint and view the model outputs. This model predicts a sentiment score probability and predicted label.

Step 3: Use the demo notebook provided to query the new JumpStart endpoint

To advance through the notebook, choose the Play icon as noted in the screenshot. As an alternative, you can also hold Shift and press Return to advance through the cells. The predicted labels and associated probabilities will be printed out at the bottom of the cell.

You have now deployed a model endpoint using the pretrained BERT Base Cased Text - Text Classification model with minimal manual effort. Congratulations!

Step 4: Clean up resources

It's best practice to delete resources that you are no longer using so that you don't incur unintended charges.

If you used an existing SageMaker Studio domain in Step 1, skip the rest of Step 4 and proceed directly to the Conclusion section.

If you ran the CloudFormation template in Step 1 to create a new SageMaker Studio domain, continue with the following steps to delete the domain, user, and the resources created by the CloudFormation template.

To open the CloudFormation console, enter CloudFormation into the AWS console search bar, and select CloudFormation from the search results.

In the CloudFormation pane, choose Stacks. From the status dropdown list, select Active. Under Stack name, select CFN-SM-IM-Lambda-catalog to open the stack details page.

On the CFN-SM-IM-Lambda-catalog stack details page, choose Delete to delete the stack along with the resources it created in Step 1.

Conclusion

Congratulations! You have completed the Get Started with Your Machine Learning Project Quickly Using Amazon SageMaker JumpStart tutorial.

You have learned how to select and deploy an Amazon SageMaker JumpStart pretrained model and make predictions.

Was this page helpful?

Feedback

Next steps

Train a machine learning model

Learn how to train, tune, and evaluate an ML model. Next »

Label training data for machine learning

Learn how to set up a labeling job for annotating training data. Next »

Find more hands-on tutorials

Find more hands-on tutorials to learn how to leverage ML. Next »