Implementing MLOps practices with Amazon SageMaker JumpStart pre-trained models

Amazon SageMaker JumpStart is the machine learning (ML) hub of SageMaker that offers over 350 built-in algorithms, pre-trained models, and pre-built solution templates to help you get started with ML fast. JumpStart provides one-click access to a wide variety of pre-trained models for common ML tasks such as object detection, text classification, summarization, text generation and much more. SageMaker Jumpstart also provides pretrained foundation models like Stability AI’s Stable Diffusion text-to-image model, BLOOM, Cohere’s Generate, Amazon’s AlexaTM and more. You can fine-tune and deploy JumpStart models using the UI in Amazon SageMaker Studio or using the SageMaker Python SDK extension for JumpStart APIs. JumpStart APIs unlock the usage of JumpStart capabilities in your workflows, and integrate with tools such as the model registry that are part of MLOps pipelines and anywhere else you’re interacting with SageMaker via SDK.

This post focuses on how we can implement MLOps with JumpStart models using JumpStart APIs, Amazon SageMaker Pipelines, and Amazon SageMaker Projects. We show how to build an end-to-end CI/CD pipeline for data preprocessing and fine-tuning ML models, registering model artifacts to the SageMaker model registry, and automating model deployment with a manual approval to stage and production. We demonstrate a customer churn classification example using the LightGBM model from Jumpstart.

MLOps pattern with JumpStart

As companies adopt machine learning across their organizations, building, training, and deploying ML models manually become bottlenecks for innovation. Establishing MLOps patterns allows you to create repeatable workflows for all stages of the ML lifecycle and are key to transitioning from the manual experimentation phase to production. MLOps helps companies innovate faster by boosting productivity of data science and ML teams in creating and deploying models with high accuracy.

Real-world data and business use cases change rapidly, and setting up MLOPs patterns with Jumpstart allows you to retrain, evaluate, version, and deploy models across environments quickly. In the initial phases of experimenting with Jumpstart models, you can use Studio notebooks to retrieve, fine-tune, deploy, and test models. Once you determine that that the model, dataset, and hyperparameters are the right fit for the business use case, the next step is to create an automatic workflow to preprocess data and fine-tune the model, register it with the model registry, and deploy the model to staging and production. In the next section, we demonstrate how you can use SageMaker Pipelines and SageMaker Projects to set up MLOps.

Integrate JumpStart with SageMaker Pipelines and SageMaker Projects

Jumpstart models can be integrated with SageMaker Pipelines and SageMaker Projects to create the CI/CD infrastructure and automate all the steps involved in model development lifecycle. SageMaker Pipelines is a native workflow orchestration tool for building ML pipelines that take advantage of direct SageMaker integration. SageMaker Projects provides MLOps templates that automatically provision underlying resources needed to enable CI/CD capabilities for your ML development lifecycle.

Building, training, tuning, and deploying Jumpstart models with SageMaker Pipelines and SageMaker Projects allows you to iterate faster and build repeatable mechanisms. Each step in the pipeline can keep track of the lineage, and intermediate steps can be cached for quickly rerunning the pipeline. With projects, dependency management, code repository management, build reproducibility, and artifact sharing is simple to set up. You can use a number of built-in templates or create your own custom template. SageMaker projects are provisioned using AWS Service Catalog products.

Solution overview

In this section, we first create a pipeline.yaml file for the customer churn example with all the steps to preprocess the data and retrieve, fine-tune, and register the model to the model registry. We then use a pre-built MLOps template to bootstrap the ML workflow and provision a CI/CD pipeline with sample code. After we create the template, we modify the sample code created from the template to use the pipeline.yaml created for our use case. The code samples for this example is available on GitHub.

The following diagram illustrates the solution architecture.

The pipeline includes the following steps:

Preprocess the datasets in the format required by JumpStart based on the type of ML problem and split data into train and validation datasets.
Perform the training step to fine-tune the pre-trained model using transfer learning.
Create the model.
Register the model.

The next sections walk through creating each step of the pipeline and running the entire pipeline. Each step in the pipeline keeps track of the lineage, and intermediate steps can be cached for quickly rerunning the pipeline. The complete pipeline and sample code are available on GitHub.

Prerequisites

To implement this solution, you must have an AWS Identity and Access Management (IAM) role that allows connection to SageMaker and Amazon S3. For more information about IAM role permissions, see Policies and permissions in IAM.

Import statements and declare parameters and constants

In this step, we download the dataset from a public S3 bucket and upload it to the private S3 bucket that we use for our training. We are also setting SageMaker and S3 client objects, and the steps to upload the dataset to an S3 bucket and provide this S3 bucket to our training job. The complete import statements and code are available on GitHub.

sm_client = boto3.client("sagemaker")
sess = sagemaker.Session()
region = boto3.Session().region_name
bucket = sess.default_bucket()
BASE_DIR = os.path.dirname(os.path.realpath(__file__))
local_path = "churn.txt"
s3 = boto3.client("s3")
s3.download_file(f"sagemaker-sample-files", "datasets/tabular/synthetic/churn.txt", local_path)
base_uri = f"s3://{bucket}/churn"
input_data_uri = sagemaker.s3.S3Uploader.upload(
    local_path=local_path,
    desired_s3_uri=base_uri,
)

Define the data processing script and processing step

Here, we provide a Python script to do data processing on the custom datasets, and curate the training, validation, and test splits to be used for model fine tuning. The preprocessing.py file used for our example is located on GitHub.

In this step, we instantiate the processor. Because the processing script is written in Pandas, we use a SKLearnProcessor. The Pipelines ProcessingStep function takes the following arguments: the processor, the input S3 locations for raw datasets, and the output S3 locations to save processed datasets. See the following code:

# Processing step for feature engineering
framework_version = "0.23-1"
sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="sklearn-churn-process",
    role=role,
    sagemaker_session=sagemaker_session,
)

step_process = ProcessingStep(
    name="JumpstartDataProcessing",  # choose any name
    processor=sklearn_processor,
    inputs=[
        ProcessingInput(source=input_data_uri, destination="/opt/ml/processing/input"),
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train", destination=f"s3://{bucket}/output/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation",destination=f"s3://{bucket}/output/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test",destination=f"s3://{bucket}/output/test"),
    ],
    code=os.path.join(BASE_DIR, "preprocessing.py"),
)

Define the pipeline step for fine-tuning

Next, we provide the pipeline steps to retrieve the model and the training script to deploy the fine-tuned model. Model artifacts for Jumpstart are stored as tarballs in an Amazon Simple Storage Service (Amazon S3) bucket. Each model is versioned and contains a unique ID that can be used to retrieve the model URI. You need the following to retrieve the URI:

model_id – A unique identifier for the JumpStart model.
model_version – The version of the specifications for the model. To use the latest version, enter *. This is a required parameter.

Select a model_id and version from the pre-trained models table, as well as a model scope. In this case, you begin by using “training” as the model scope. Use the utility functions to retrieve the URI of each of the three components you need to continue. Select the instance type; for this model we can use a GPU or a non-GPU instance. The model in this example uses an ml.m5.4xlarge instance type. See the following code:

# Estimator Instance count and instance type.
instance_count = 1
instance_type = "ml.m5.4xlarge"
model_id, model_version = "lightgbm-classification-model", "*"
training_instance_type = "ml.m5.4xlarge"
# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)
# Retrieve the training script
train_source_uri = script_uris.retrieve(model_id=model_id, model_version=model_version, script_scope="training")
# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(model_id=model_id, model_version=model_version, model_scope="training")

#Set S3 URIs
training_dataset_s3_path = f"s3://{bucket}/output/"
output_prefix = "jumpstart-example-tabular-training"
s3_output_location = f"s3://{bucket}/{output_prefix}/output"
    
# Get the default JumpStart hyperparameters
default_hyperparameters = hyperparameters.retrieve_default(
    model_id=model_id,
    model_version=model_version,
)

Next, use the model resource URIs to create an Estimator and train it on a custom training dataset. You must specify the S3 path of your custom training dataset. The Estimator class requires an entry_point parameter. JumpStart uses transfer_learning.py. The training job fails to run if this value is not set. While the model is fitting to your training dataset, you can see console output that reflects the progress the training job is making. This gives more context about the training job, including the transfer_learning.py script. Then, we instantiate the fine-tuning step using a SageMaker LightGBM classification estimator and the Pipelines TrainingStep function.

ic_estimator = Estimator(
    role=role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=default_hyperparameters,
    output_path=s3_output_location,
    sagemaker_session=sagemaker_session,
    training=training_dataset_s3_path,
)
xgb_input_content_type = None

training_step = TrainingStep(
    name="JumpStartFineTraining",
    estimator=ic_estimator,
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
            content_type="text/csv",
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs["validation"].S3Output.S3Uri,
            content_type="text/csv",
        ),
    }
)

Define the pipeline step to retrieve the inference container and script for the model

To deploy the fine-tuned model artifacts to a SageMaker endpoint, we need an inference script and an inference container. We then initialize a SageMaker Model that can be deployed to an Endpoint. We pass the inference script as the entry point for our model.

deploy_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    image_scope="inference",
    model_id=model_id,
    model_version=model_version,
    instance_type=inference_instance_type,
)
model = Model(
    image_uri=deploy_image_uri,
    entry_point="inference.py",
    source_dir= Inference_dir,
    model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=sagemaker_session,
    name="JumpStartRegisterModel",
    role=role,
)

Define the pipeline steps for the model registry

The following code registers the model within the SageMaker model registry using the Pipelines model step. You can set the approval status to Approved or PendingManualApproval. PendingManualApproval requires a manual approval in the Studio IDE.

approval_status="Approved"
step_register = RegisterModel(
name="JumpStartRegisterModel",
model=model,
content_types=["text/csv"],
response_types=["text/csv"],
inference_instances=["ml.t2.medium", "ml.m5.4xlarge"],
transform_instances=["ml.m5.4xlarge"],
model_package_group_name=model_package_group_name,
approval_status=approval_status,
model_metrics=model_metrics,
)

Define the pipeline

After defining all of the component steps, you can assemble them into a Pipelines object. You don’t need to specify the order of the pipeline because Pipelines automatically infers the order sequence based on the dependencies between the steps. See the following code:

# Create a unique pipeline name with flow export name
pipeline_name = "sm-jumpstart-churn-prediction-pipeline"

# Combine pipeline steps
pipeline_steps = [step_process,training_step,step_register]

pipeline = Pipeline(
name=pipeline_name,
parameters=[processing_instance_count,instance_type, instance_count,input_data],
steps=pipeline_steps,
sagemaker_session=sess
)

Launch a deployment template with SageMaker Projects

After you create the pipeline steps, we can launch an MLOps project template from the Studio console, as shown in the following screenshot.

On the projects page, you can launch a preconfigured SageMaker MLOps template. For this example, we choose MLOps template for model building, training, and deployment.

This template creates the following architecture.

The following AWS services and resources are created:

Two repositories are added to AWS CodeCommit:
- The first repository provides the code to create a multi-step model building pipeline along with a build specification file, used by AWS CodePipeline and AWS CodeBuild to run the pipeline automatically.
- The second repository contains code and configuration files for model deployment. This repo also uses CodePipeline and CodeBuild, which run an AWS CloudFormation template to create model endpoints for staging and production.
Two CodePipeline pipelines:
- The ModelBuild pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to the ModelBuild CodeCommit repository.
- The ModelDeploy pipeline automatically triggers whenever a new model version is added to the SageMaker model registry and the status is marked as Approved. Models that are registered with Pending or Rejected statuses aren’t deployed.
An S3 bucket is created for output model artifacts generated from the pipeline.

Modify the sample code for a custom use case

To modify the sample code from the launched template, we first need to clone the CodeCommit repositories to our local Studio instance. From the list of projects, choose the one that was just created. On the Repositories tab, you can choose the hyperlinks to locally clone the CodeCommit repos.

After you clone the repositories in the previous step, you can modify the seed code that was created from the template. You can create a customized pipeline.yaml file with the required steps. For this example, we can customize the pipeline by navigating to the pipelines folder in the ModelBuild repository. In the pipelines directory, you can find the abalone folder that contains the seed pipeline code. Replace the contents of the abalone directory with the scripts present in the GitHub folder. Rename the abalone directory to customer_churn.

We also have to modify the path inside codebuild-buildspec.yml, as shown in the sample repository:

run-pipeline --module-name pipelines.customer_churn.pipeline \

The ModelDeploy folder has the CloudFormation templates for the deployment pipeline. As a new model is available in the model registry, it’s deployed to the staging endpoint. After a manual approval, the model is then deployed to production. Committing the changes to CodeCommit triggers a new pipeline run. You can directly commit from the Studio IDE.

The build phase registers a model to the model registry. When a new model is available, the staging deployment process is triggered. After staging is successfully deployed, a manual approval is required to deploy the model to a production endpoint. The following screenshot shows the pipeline steps.

After a manual approval is provided, we can see that the production endpoint has been successfully created. At this point, the production endpoint is ready for inference.

Clean up

To avoid ongoing charges, delete the inference endpoints and endpoint configurations via the SageMaker console. You can also clean up the resources by deleting the CloudFormation stack.

Conclusion

Jumpstart provides hundreds of pre-trained models for common ML tasks, including computer vision and natural language processing uses cases. In this post, we showed how you can productionize JumpStart’s features with end-to-end CI/CD using SageMaker Pipelines and SageMaker Projects. We’ve shown how you can create a pipeline with steps for data preprocessing, and training and registering a model. We’ve also demonstrated how changes to the source code can trigger an entire model building and deployment process with the necessary approval process. This pattern can be extended to any other JumpStart models and solutions.

About the authors

Vivek Gangasani is a Senior Machine Learning Solutions Architect at Amazon Web Services. He works with Machine Learning Startups to build and deploy AI/ML applications on AWS. He is currently focused on delivering solutions for MLOps, ML Inference and low-code ML. He has worked on projects in different domains, including Natural Language Processing and Computer Vision.

Rahul Sureka is an Enterprise Solution Architect at AWS based out of India. Rahul has more than 22 years of experience in architecting and leading large business transformation programs across multiple industry segments. His areas of interests are data and analytics, streaming, and AI/ML applications.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.