In this tutorial, you'll learn how to train, tune, and evaluate a machine learning (ML) model using Amazon SageMaker Studio and Amazon SageMaker Clarify.
Amazon SageMaker Studio is an integrated development environment (IDE) for ML that provides a fully managed Jupyter notebook interface in which you can perform end-to-end ML lifecycle tasks. Using SageMaker Studio, you can create and explore datasets; prepare training data; build, train, and tune models; and deploy trained models for inference—all in one place. With Amazon SageMaker Clarify, you can have greater visibility into your training data and models so you can identify and limit bias and explain predictions.
For this tutorial, you'll use a synthetically generated auto insurance claims dataset. The inputs are the training, validation, and test datasets, each containing details and extracted features about claims and customers along with a fraud column indicating whether a claim was fraudulent or otherwise. You'll use the open source XGBoost framework to build a binary classification model on this synthetic dataset to predict the likelihood of a claim being fraudulent. You'll also evaluate the trained model by running bias and feature importance reports, deploy the model for testing, and run sample inference to evaluate model performance and explain predictions.
What you will accomplish
In this guide, you will:
- Build, train, and tune a model using script mode
- Detect bias in ML models and understand model predictions
- Deploy the trained model to a real-time inference endpoint for testing
- Evaluate the model by generating sample predictions and understanding feature impact
Before starting this guide, you will need:
- An AWS account: If you don't already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
Congratulations! You have finished the Train a Machine Learning Model tutorial.
In this tutorial, you used Amazon SageMaker Studio to train a binary classification model in script mode. You used the open source XGBoost library with the AWS managed XGBoost container to train and tune the model using SageMaker hyperparameter tuning jobs. You also analyzed bias and model explainability using SageMaker Clarify and used the reports to assess the feature impact on individual predictions. Finally, you used the SageMaker SDK to deploy the model to a real-time inference endpoint and tested it with a sample payload.
You can continue your data scientist journey with Amazon SageMaker by following the next steps section below.