In this tutorial, learn how to create and automate end-to-end machine learning (ML) workflows using Amazon SageMaker Pipelines, Amazon SageMaker Model Registry, and Amazon SageMaker Clarify.
SageMaker Pipelines is the first purpose-built continuous integration and continuous delivery (CI/CD) service for ML. With SageMaker Pipelines, you can automate different steps of the ML workflow, including data loading, data transformation, training, tuning, evaluation, and deployment. SageMaker Model Registry allows you to track model versions, their metadata such as use case grouping, and model performance metrics baselines in a central repository where it is easy to choose the right model for deployment based on your business requirements. SageMaker Clarify provides greater visibility into your training data and models so you can identify and limit bias and explain predictions.
In this tutorial, you will implement a SageMaker pipeline to build, train, and deploy an XGBoost binary classification model that predicts the likelihood of an auto insurance claim being fraudulent. You will use a synthetically generated auto insurance claims dataset. The raw inputs are two tables of insurance data: a claims table and a customers table. The claims table has a column named fraud indicating whether a claim was fraudulent or otherwise. Your pipeline will process the raw data; create training, validation, and test datasets; and build and evaluate a binary classification model. It will then use SageMaker Clarify to test model bias and explainability, and lastly deploy the model for inference.
What you will accomplish
In this guide, you will:
- Build and run a SageMaker pipeline to automate the end-to-end ML lifecyle
- Generate predictions using the deployed model
Before starting this guide, you will need:
- An AWS account: If you don't already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
Congratulations! You have finished the Automate Machine Learning Workflows tutorial.
You have successfully used Amazon SageMaker Pipelines to automate the end-to-end ML workflow starting from data processing, model training, model evaluation, bias and explainability checking, conditional model registration, and deployment. Lastly, you used the SageMaker SDK to deploy the model to a real-time inference endpoint and tested it with a sample payload.
You can continue your machine learning journey with Amazon SageMaker by following the next steps section below.