Build and Train a Machine Learning Model Locally
In this tutorial, you learn how to build and train a machine learning (ML) model locally within your Amazon SageMaker Studio notebook.
Amazon SageMaker Studio is an integrated development environment (IDE) for ML that provides a fully managed Jupyter notebook interface in which you can perform end-to-end ML lifecycle tasks. Using SageMaker Studio, you can create and explore datasets, prepare training data, build and train models, and deploy trained models for inference—all in one place.
Exploring a sample of the dataset and iterating over multiple model and parameter configurations before training with the full dataset is a common practice in ML development. For this exploration stage, Amazon SageMaker provides the local mode, which allows you to test your training logic, try different modeling approaches, and gauge model performance before running full-scale training jobs.
For this tutorial, you use a synthetically generated auto insurance claims dataset. The inputs are the training and test datasets, each containing details and extracted features about claims and customers along with a fraud column indicating whether a claim was fraudulent or otherwise. You will use the open source XGBoost framework to prototype a binary classification model on this synthetic dataset to predict the likelihood of a claim being fraudulent.
What you will accomplish
In this guide, you will:
- Ingest training data from Amazon S3 into Amazon SageMaker
- Build and train an XGBoost model locally
- Save the trained model and artifacts to Amazon S3
Before starting this guide, you will need:
- An AWS account: If you don't already have an account, follow the Setting Up Your AWS Environment getting started guide for a quick overview.
With Amazon SageMaker, you can deploy a model visually using the console or programmatically using either SageMaker Studio or SageMaker notebooks. In this tutorial, you deploy the model programmatically using a SageMaker Studio notebook, which requires a SageMaker Studio domain.
Amazon SageMaker supports the creation of multiple Amazon SageMaker Domains in a single AWS Region for each account. If you already have a SageMaker Studio domain in the US East (N. Virginia) Region, follow the SageMaker Studio setup guide to attach the required AWS IAM policies to your SageMaker Studio account, then skip Step 1, and proceed directly to Step 2.
If you don't have an existing SageMaker Studio domain, continue with Step 1 to run an AWS CloudFormation template that creates a SageMaker Studio domain and adds the permissions required for the rest of this tutorial.
Choose the AWS CloudFormation stack link. This link opens the AWS CloudFormation console and creates your SageMaker Studio domain and a user named studio-user. It also adds the required permissions to your SageMaker Studio account. In the CloudFormation console, confirm that US East (N. Virginia) is the Region displayed in the upper right corner. Stack name should be CFN-SM-IM-Lambda-catalog, and should not be changed. This stack takes about 10 minutes to create all the resources.
This stack assumes that you already have a public VPC set up in your account. If you do not have a public VPC, see VPC with a single public subnet to learn how to create a public VPC.
Select I acknowledge that AWS CloudFormation might create IAM resources, and then choose Create stack.
On the CloudFormation pane, choose Stacks. It takes about 10 minutes for the stack to be created. When the stack is created, the status of the stack changes from CREATE_IN_PROGRESS to CREATE_COMPLETE.
Congratulations! You have finished the Build and Train a Machine Learning Model Locally tutorial.
In this tutorial, you used Amazon SageMaker Studio to locally build a binary classification model using the XGBoost open source library and saved the model artifacts and outputs to Amazon S3. As shown in this tutorial, with quick prototyping in SageMaker Studio, you can assess model performance and potential overfitting issues before production model training using the full dataset.
You can continue your data scientist journey with Amazon SageMaker by following the next steps section below.