Guidance for Training an AWS DeepRacer Model using Amazon SageMaker
Overview
How it works
This architecture diagram is intended for software developers, showing how they can use an Amazon SageMaker Notebook instance to directly train and evaluate AWS DeepRacer models with full control. This includes augmenting the simulation environment, manipulating inputs to the neural network, modifying neural network architecture, running distributed rollouts, and debugging their model.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
The services you can configure in this Guidance enhance your operational excellence in a number of ways. First, SageMaker streamlines the process of training Reinforcement Learning (RL) models, while RoboMaker automates the creation of simulation environments and data generation. Second, Kinesis Video Streams allows real-time monitoring of model training and evaluation. And third, CloudWatch provides centralized logging and monitoring for all services involved, enabling efficient operations management.
Security
CloudWatch offers centralized security monitoring and alerts to support efficient threat detection. Also, Amazon Virtual Private Cloud (Amazon VPC) with VPC endpoints ensures private and secure communication between SageMaker, RoboMaker, and Amazon S3, preventing exposure to the public internet.
Reliability
SageMaker ensures the reliability of model training by managing the implementation of training jobs, including fault tolerance and recovery. Moreover, Amazon S3 offers reliable storage for training data and model images, ensuring data availability and durability. RoboMaker contributes to reliability by creating and managing the simulation environment, enabling robust data generation for training. Also, Kinesis Video Streams stream live training and evaluation, allowing real-time monitoring for reliability assessment. It also provides capabilities in multiple Availability Zones. Finally, CloudWatch provides comprehensive logs, metrics, and operational insights, aiding in identifying and mitigating reliability issues promptly.
Performance Efficiency
The management capabilities of SageMaker streamline model training, utilizing compute resources efficiently along with right sizing the instance on which it is running. Also, SageMaker Notebook uses ml.t3.2xlarge and SageMaker training uses ml.c4.2xlarge instances–optimizing the performance of SageMaker for this Guidance. Additionally, RoboMaker enhances performance efficiency by creating and managing a simulation environment optimized for AWS DeepRacer training.
Cost Optimization
SageMaker training jobs are sized to the workload and shut down when the training job is complete, helping you avoid unnecessary costs, and the clean up code for SageMaker Notebook further ensures efficient resource use by removing unnecessary components. Also, the automatic shutdown of RoboMaker reduces idle resource costs. RoboMaker also includes clean up code to delete resources, minimizing residual costs.
Sustainability
SageMaker supports sustainability by helping you to efficiently managing resources during RL model training, reducing energy consumption and your environmental impact. Right sizing of the underlying instance helps to optimize compute resources sustainably. Furthermore, Kinesis Video Streams enables real-time monitoring, helping you make informed decisions to optimize resource usage and minimize energy waste.
Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages