Guidance for Training an AWS DeepRacer Model using Amazon SageMaker

This Guidance demonstrates how software developers can use an Amazon SageMaker Notebook instance to directly train and evaluate AWS DeepRacer models with full control. This includes augmenting the simulation environment, manipulating inputs to the neural network, modifying neural network architecture, running distributed rollouts, and debugging their model. The AWS DeepRacer console is optimized to provide a user-friendly introduction to reinforcement learning for developers new to machine learning.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF

Step 1
The User logs into their AWS account, creates an Amazon SageMaker Notebook to train a Reinforcement Learning (RL) model.

Step 2
SageMaker Notebook stores all files required for the training and evaluation jobs in Amazon Simple Storage Service (Amazon S3) and as images in Amazon Elastic Container Registry (Amazon ECR).

Step 3
Amazon SageMaker downloads images from Amazon ECR and starts a model training job.

Step 4
AWS RoboMaker downloads images from Amazon ECR and creates a racing simulation environment for AWS DeepRacer.

Step 5
RoboMaker starts data generation for model training. Training data generated by RoboMaker is a collection of tuples comprising of agent initial state, action, new state, reward.

Step 6
RoboMaker sends pre configured batches of these tuples, called iterations, to Amazon S3 and writes a key to a SageMaker container.

Step 7
SageMaker training picks up the data file from Amazon S3 based on the key, and trains the model on this dataset. SageMaker uploads the next version of the model to Amazon S3 that is picked up by RoboMaker.

Steps 5 to 7 are iterative, ending when the training job duration ends. Training job duration can be configured in SageMaker Notebook code using the parameter job_duration_in_seconds.

Step 8
RoboMaker streams live training and evaluation jobs to Amazon Kinesis Video Streams. Users can view model training and evaluation in real time on Kinesis Video Streams.

Step 9
All training logs, evaluation logs, service calls, and operational metrics can be viewed in the Amazon CloudWatch console.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

The services you can configure in this Guidance enhance your operational excellence in a number of ways. First, SageMaker streamlines the process of training Reinforcement Learning (RL) models, while RoboMaker automates the creation of simulation environments and data generation. Second, Kinesis Video Streams allows real-time monitoring of model training and evaluation. And third, CloudWatch provides centralized logging and monitoring for all services involved, enabling efficient operations management.

Read the Operational Excellence whitepaper
Security

CloudWatch offers centralized security monitoring and alerts to support efficient threat detection. Also, Amazon Virtual Private Cloud (Amazon VPC) with VPC endpoints ensures private and secure communication between SageMaker, RoboMaker, and Amazon S3, preventing exposure to the public internet.

Read the Security whitepaper
Reliability

SageMaker ensures the reliability of model training by managing the implementation of training jobs, including fault tolerance and recovery. Moreover, Amazon S3 offers reliable storage for training data and model images, ensuring data availability and durability. RoboMaker contributes to reliability by creating and managing the simulation environment, enabling robust data generation for training. Also, Kinesis Video Streams stream live training and evaluation, allowing real-time monitoring for reliability assessment. It also provides capabilities in multiple Availability Zones. Finally, CloudWatch provides comprehensive logs, metrics, and operational insights, aiding in identifying and mitigating reliability issues promptly.

Read the Reliability whitepaper
Performance Efficiency

The management capabilities of SageMaker streamline model training, utilizing compute resources efficiently along with right sizing the instance on which it is running. Also, SageMaker Notebook uses ml.t3.2xlarge and SageMaker training uses ml.c4.2xlarge instances–optimizing the performance of SageMaker for this Guidance. Additionally, RoboMaker enhances performance efficiency by creating and managing a simulation environment optimized for AWS DeepRacer training.

Read the Performance Efficiency whitepaper
Cost Optimization

SageMaker training jobs are sized to the workload and shut down when the training job is complete, helping you avoid unnecessary costs, and the clean up code for SageMaker Notebook further ensures efficient resource use by removing unnecessary components. Also, the automatic shutdown of RoboMaker reduces idle resource costs. RoboMaker also includes clean up code to delete resources, minimizing residual costs.

Read the Cost Optimization whitepaper
Sustainability

SageMaker supports sustainability by helping you to efficiently managing resources during RL model training, reducing energy consumption and your environmental impact. Right sizing of the underlying instance helps to optimize compute resources sustainably. Furthermore, Kinesis Video Streams enables real-time monitoring, helping you make informed decisions to optimize resource usage and minimize energy waste.

Read the Sustainability whitepaper

Implementation Resources

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open sample code on GitHub

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Guidance for Training an AWS DeepRacer Model using Amazon SageMaker

[SEO Subhead]

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer