AWS Machine Learning Blog

Incremental training with Amazon SageMaker JumpStart

In December 2020, AWS announced the general availability of Amazon SageMaker JumpStart, a capability of Amazon SageMaker that helps you quickly and easily get started with machine learning (ML). SageMaker JumpStart provides one-click fine-tuning and deployment of a wide variety of pre-trained models across popular ML tasks, as well as a selection of end-to-end solutions that solve common business problems. These features remove the heavy lifting from each step of the ML process, making it easier to develop high-quality models and reducing time to deployment.

All JumpStart content was previously available only through Amazon SageMaker Studio, which provides a user-friendly graphical interface to interact with the feature. Recently, we also announced the launch of easy-to-use JumpStart APIs as an extension of the SageMaker Python SDK, allowing you to programmatically deploy and fine-tune a vast selection of JumpStart-supported pre-trained models on your own datasets. This launch unlocks the usage of JumpStart capabilities in your code workflows, MLOps pipelines, and anywhere else you’re interacting with SageMaker via SDK.

In this post, we’re excited to announce that all trainable JumpStart models now support incremental training. Incremental training allows you to train a model you have already fine-tuned using an expanded dataset that contains an underlying pattern not accounted for in previous fine-tuning runs, which resulted in poor model performance. Incremental training saves both time and resources because you don’t need to retrain the model from scratch. If you want to jump straight into the JumpStart API code we explain in this post, you can refer to the sample notebook.

JumpStart overview

JumpStart is a multi-faceted product that includes different capabilities to help get you quickly started with ML on SageMaker. At the time of writing, JumpStart enables you to do the following:

  • Deploy pre-trained models for common ML tasks – JumpStart enables you to address common ML tasks with no development effort by providing easy deployment of models pre-trained on large, publicly available datasets. The ML research community has put a large amount of effort into making a majority of recently developed models publicly available for use; JumpStart hosts a collection of over 300 models, spanning the 15 most popular ML tasks such as object detection, text classification, and text generation, making it easy for beginners to use them. These models are drawn from popular model hubs, such as TensorFlow, PyTorch, Hugging Face, and MXNet Hub.
  • Fine-tune pre-trained models – JumpStart allows you to fine-tune pre-trained models with no need to write your own training algorithm. In ML, the ability to transfer the knowledge learned in one domain to another is called transfer learning. You can use transfer learning to produce accurate models on your smaller datasets, with much lower training costs than the ones involved in training the original model. JumpStart also includes popular training algorithms based on LightGBM, CatBoost, XGBoost, and Scikit-learn that you can train from scratch for tabular regression and classification.
  • Use pre-built solutions – JumpStart provides a set of 17 solutions for common ML use cases such as demand forecasting and industrial and financial applications, which you can deploy with just a few clicks. Solutions are end-to-end ML applications that string together various AWS services to solve a particular business use case. They use AWS CloudFormation templates and reference architectures for quick deployment, which means they are fully customizable.
  • Use notebook examples for SageMaker algorithms – SageMaker provides a suite of built-in algorithms to help data scientists and ML practitioners get started with training and deploying ML models quickly. JumpStart provides sample notebooks that you can use to quickly apply these algorithms.
  • Review training videos and blogs – JumpStart also provides numerous blog posts and videos that teach you how to use different functionalities within SageMaker.

JumpStart accepts custom VPC settings and AWS Key Management Service (AWS KMS) encryption keys, so you can use the available models and solutions securely within your enterprise environment. You can pass your security settings to JumpStart within Studio or through the SageMaker Python SDK.

Image classification

Image classification refers to classifying an image into one of the class labels in the training dataset. You can fine-tune the model to any given dataset comprising images belonging to any number of classes. The model available for fine-tuning on JumpStart attaches a classification layer to the corresponding feature extractor model and initializes the layer parameters to random values. The output dimension of the classification layer is determined based on the number of classes in the input data. The fine-tuning step tunes the classification layer parameters, while keeping the parameters of the feature extractor model frozen, and returns the fine-tuned model. The objective is to minimize prediction error on the input data.

For our dataset, the input is a directory with as many sub-directories as the number of classes. Each sub-directory should have images belonging to that class in .jpg format. The input directory should look like the following hierarchy if the training data contains images from two classes: roses and dandelion:

input_directory 
     |--roses 
           |--abc.jpg 
           |--def.jpg 
     |--dandelion 
           |--ghi.jpg 
           |--jkl.jpg

The names of the folders, classes, and .jpg file names can be anything.

We provide the tf_flowers1 dataset as a default dataset for fine-tuning the model. This dataset comprises images of five types of flowers. The dataset has been downloaded from TensorFlow.

Walkthrough overview

The following sections provide a step-by-step demo to perform image classification with JumpStart, both via the Studio UI and JumpStart APIs.

We walk through the following steps:

  1. Access JumpStart through the Studio UI:
    1. Fine-tune the pre-trained model.
    2. Deploy the fine-tuned model.
    3. Incrementally train the fine-tuned model and redeploy.
  2. Use JumpStart programmatically with the SageMaker Python SDK:
    1. Fine-tune the pre-trained model.
    2. Deploy the fine-tuned model.
    3. Incrementally train the fine-tuned model and redeploy.

Access JumpStart through the Studio UI

In this section, we demonstrate how to fine-tune and deploy JumpStart models through the Studio UI. Additionally, we show how to incrementally train a model that you have previously fine-tuned.

Fine-tune the pre-trained model

The following video shows you how to find a pre-trained image classification model on JumpStart and fine-tune it. The model page contains valuable information about the model, how to use it, expected data format, and some fine-tuning details.

For demonstration purposes, we fine-tune the model using the dataset provided by default, which is the tf_flowers dataset, composed of different varieties of flowers. Fine-tuning on your own dataset involves taking the correct formatting of data (as explained on the model page), uploading it to Amazon Simple Storage Service (Amazon S3), and specifying its location in the data source configuration.

We use the same hyperparameter values set by default (number of epochs, learning rate, and batch size). We also use a GPU-backed ml.p3.2xlarge instance as our SageMaker training instance.

You can monitor your training job directly on the Studio console, and are notified upon its completion.

Deploy the fine-tuned model

After training is complete, you can deploy the fine-tuned model from the same page that holds the training job details. To deploy our model, we pick a different instance type, ml.p2.xlarge. It still provides the GPU acceleration needed for low inference latency, but at a lower price point. After you configure the SageMaker hosting instance, choose Deploy. It may take 5–10 minutes until your persistent endpoint is up and running.

Then your endpoint is operational and ready to respond to inference requests!

To accelerate your time to inference, JumpStart provides a sample notebook that shows you how to run inference on your freshly deployed endpoint. Choose Open Notebook under Use Endpoint from Studio.

Incrementally train the fine-tuned model and deploy

When fine-tuning is complete, you can further train the model to boost performance. This step is very similar to the initial fine-tuning process, except that we use the already fine-tuned model as the starting point. You may use new data, but the dataset format must be the same (same set of classes).

Use JumpStart programmatically with the SageMaker SDK

In the preceding sections, we showed how you can use the JumpStart UI to fine-tune, deploy, and incrementally train a model interactively in a matter of a few clicks. You can also use JumpStart’s models and easy fine-tuning programmatically by using APIs that are integrated into the SageMaker SDK. We now go over a quick example of how you can replicate the preceding process. All the steps in this demo are available in the accompanying notebooks Introduction to JumpStart – Image Classification.

Fine-tune the pre-trained model

To fine-tune a selected model, we need to get that model’s URI, as well as that of the training script and the container image used for training. Thankfully, these three inputs depend solely on the model name, version (for a list of available models, see JumpStart Available Model Table), and type of instance you want to train on. This is demonstrated in the following code snippet:

from sagemaker import image_uris, model_uris, script_uris

model_id, model_version = "pytorch-ic-mobilenet-v2", "1.0.0"
training_instance_type = "ml.p3.2xlarge"

# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)

# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")

# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

We retrieve the model_id corresponding to the same model we used previously. The ic in the identifier corresponds to image classification.

You can now fine-tune this JumpStart model on your own custom dataset using the SageMaker SDK. We use the same tf_flowers dataset that is publicly hosted on Amazon S3, conveniently focused on sentiment analysis. Your dataset should be structured for fine-tuning, as explained in the previous section. See the following example code:

# URI of your training dataset
training_dataset_s3_path = "s3://jumpstart-cache-prod-us-west-2/training-datasets/tf_flowers/"
training_job_name = name_from_base(f"jumpstart-example-{model_id}-transfer-learning")

# Create SageMaker Estimator instance
ic_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
)

# Launch a SageMaker Training job by passing s3 path of the training data
ic_estimator.fit({"training": training_dataset_s3_path}, logs=True)

We obtain the same default hyperparameters for our selected model as the ones we saw in the previous section, using sagemaker.hyperparameters.retrieve_default(). We then instantiate a SageMaker estimator and call the .fit method to start fine-tuning our model, passing it the Amazon S3 URI for our training data. As you can see, the entry_point script provided is named transfer_learning.py (the same for other tasks and models), and the input data channel passed to .fit must be named training.

Deploying the fine-tuned model

When training is complete, you can deploy your fine-tuned model. To do so, all we need to obtain is the inference script URI (the code that determines how the model is used for inference once deployed) and the inference container image URI, which includes an appropriate model server to host the model we chose. See the following code:

# Retrieve the inference docker container uri
deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
# Retrieve the inference script uri
deploy_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="inference"
)

endpoint_name = name_from_base(f"jumpstart-example-FT-{model_id}-")

# Use the estimator from the previous step to deploy to a SageMaker endpoint
finetuned_predictor = ic_estimator.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    entry_point="inference.py",
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    endpoint_name=endpoint_name,
)

After a few minutes, our model is deployed and we can get predictions from it in real time!

Next, we invoke the endpoint to predict what type of flowers exist in the example image. We use the query_endpoint and parse_response helper functions, which are defined in the accompanying notebook.

query_response = finetuned_predictor.predict(
        img, {"ContentType": "application/x-image", "Accept": "application/json;verbose"}
    )
model_predictions = json.loads(query_response)
predicted_label = model_predictions["predicted_label"]
display(
    HTML(
        f'<img src={image_filename} alt={image_filename} align="left" style="width: 250px;"/>'
        f"<figcaption>Predicted Label: {predicted_label}</figcaption>"
    )
)

Incrementally train the fine-tuned model and redeploy

We can increase the performance of a fine-tuned model by further training it on new images. You may use any number of new or old images for this, however the dataset format must remain the same (same set of classes). The incremental training step is similar to the fine-tuning process, with an important difference: in the initial fine-tuning we start with a pre-trained model, whereas in incremental training we start with an existing fine-tuned model. See the following code:

last_trained_model_path =  f"{s3_output_location}/{last_training_job_name}/output/model.tar.gz"
incremental_s3_output_location = f"s3://{output_bucket}/{incremental_output_prefix}/output"incremental_train_estimator = Estimator(
    role=aws_role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=last_trained_model_path,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=incremental_s3_output_location,
    base_job_name=incremental_training_job_name,
)

incremental_train_estimator.fit({"training": training_dataset_s3_path}, logs=True)

When training is complete, we can use the same steps as the ones described in the preceding section to deploy the model.

Conclusion

JumpStart is a capability in SageMaker that allows you to quickly get started with ML. JumpStart uses open-source pre-trained models to solve common ML problems like image classification, object detection, text classification, sentence pair classification, and question answering.

In this post, we showed you how to fine-tune and deploy a pre-trained image classification model. We also showed how to incrementally train a fine-tuned model for image classification. With JumpStart, you can easily perform this process with no need to code. Try out the solution on your own and let us know how it goes in the comments. To learn more about JumpStart, check out the AWS re:Invent 2020 video Get started with ML in minutes with Amazon SageMaker JumpStart.

To learn more about JumpStart, please checkout the following blogs:

References

  1. The TensorFlow Team, 2019

About the Authors

Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD. from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.

João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use cases and helping customers optimize deep learning model training and deployment. He is also an active proponent of low-code ML solutions and ML-specialized hardware.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms and helps develop machine learning algorithms. He is an active researcher in machine learning and statistical inference and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.