AWS Solutions Library

Guidance for Training an AWS DeepRacer Model using Amazon SageMaker

Go to sample code

Overview

This Guidance demonstrates how software developers can use an Amazon SageMaker Notebook instance to directly train and evaluate AWS DeepRacer models with full control. This includes augmenting the simulation environment, manipulating inputs to the neural network, modifying neural network architecture, running distributed rollouts, and debugging their model. The AWS DeepRacer console is optimized to provide a user-friendly introduction to reinforcement learning for developers new to machine learning.

How it works

This architecture diagram is intended for software developers, showing how they can use an Amazon SageMaker Notebook instance to directly train and evaluate AWS DeepRacer models with full control. This includes augmenting the simulation environment, manipulating inputs to the neural network, modifying neural network architecture, running distributed rollouts, and debugging their model.

Download the architecture diagram

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

The services you can configure in this Guidance enhance your operational excellence in a number of ways. First, SageMaker streamlines the process of training Reinforcement Learning (RL) models, while RoboMaker automates the creation of simulation environments and data generation. Second, Kinesis Video Streams allows real-time monitoring of model training and evaluation. And third, CloudWatch provides centralized logging and monitoring for all services involved, enabling efficient operations management.

Read the Operational Excellence whitepaper

Security

CloudWatch offers centralized security monitoring and alerts to support efficient threat detection. Also, Amazon Virtual Private Cloud (Amazon VPC) with VPC endpoints ensures private and secure communication between SageMaker, RoboMaker, and Amazon S3, preventing exposure to the public internet.

Read the Security whitepaper

Reliability

SageMaker ensures the reliability of model training by managing the implementation of training jobs, including fault tolerance and recovery. Moreover, Amazon S3 offers reliable storage for training data and model images, ensuring data availability and durability. RoboMaker contributes to reliability by creating and managing the simulation environment, enabling robust data generation for training. Also, Kinesis Video Streams stream live training and evaluation, allowing real-time monitoring for reliability assessment. It also provides capabilities in multiple Availability Zones. Finally, CloudWatch provides comprehensive logs, metrics, and operational insights, aiding in identifying and mitigating reliability issues promptly.

Read the Reliability whitepaper

Performance Efficiency

The management capabilities of SageMaker streamline model training, utilizing compute resources efficiently along with right sizing the instance on which it is running. Also, SageMaker Notebook uses ml.t3.2xlarge and SageMaker training uses ml.c4.2xlarge instances–optimizing the performance of SageMaker for this Guidance. Additionally, RoboMaker enhances performance efficiency by creating and managing a simulation environment optimized for AWS DeepRacer training.

Read the Performance Efficiency whitepaper

Cost Optimization

SageMaker training jobs are sized to the workload and shut down when the training job is complete, helping you avoid unnecessary costs, and the clean up code for SageMaker Notebook further ensures efficient resource use by removing unnecessary components. Also, the automatic shutdown of RoboMaker reduces idle resource costs. RoboMaker also includes clean up code to delete resources, minimizing residual costs.

Read the Cost Optimization whitepaper

Sustainability

SageMaker supports sustainability by helping you to efficiently managing resources during RL model training, reducing energy consumption and your environmental impact. Right sizing of the underlying instance helps to optimize compute resources sustainably. Furthermore, Kinesis Video Streams enables real-time monitoring, helping you make informed decisions to optimize resource usage and minimize energy waste.

Read the Sustainability whitepaper

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.

Go to sample code

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages