Toyota Research Institute accelerates safe automated driving with deep learning at a global scale on AWS
Vehicles with self-driving technology can bring many benefits to society. One of the top priorities at Toyota Research Institute (TRI) is to apply the latest advancements in artificial intelligence (AI) to help Toyota produce cars that are safer, more accessible, and more environmentally friendly. To help TRI achieve their goals, they turned to deep learning on Amazon Web Services (AWS).
Using Amazon EC2 P3 instances, TRI is seeing a 4X faster time-to-train than the P2 instances they had used previously, reducing their training time from days to hours. This gives them significant agility to optimize and retrain their models quickly and to deploy them in their test cars or simulation environments for further testing. In addition, the significant performance improvement in P3 instances over P2 instances, coupled with the AWS pay-as-you-go model, translates to lower operating costs for TRI.
Creating deep learning models for automated driving
TRI is developing a single technology stack for their automated driving technology with two modes: Guardian and Chauffeur. Guardian mode requires the driver to have hands on the wheel and eyes on the road at all times, while it constantly monitors the driving environment, inside and out, intervening only when necessary when it perceives a potential crash. Chauffeur mode uses the same technology, but is always in control, and vehicle occupants are strictly passengers.
Developing and deploying autonomous vehicles requires the ability to collect, store, and manage massive amounts of data, high performance computing capacity, and advanced deep learning techniques, along with the capability to do real-time processing in the vehicle.
Using the PyTorch deep learning framework, TRI created deep learning computer vision models to automatically provide monitoring and control in both driving modes. To gather data used in their deep learning models, TRI has a fleet of test cars equipped with various types of data acquisition sensors such as cameras, radar, and LIDAR (a technique used in control and navigation to generate object representations in 3D space). These test vehicles drive through various Operational Design Domains (ODD), collecting and recording data, which amounts to terabytes of data per day per car. This data needs to be quickly retrieved, prepped, and made available for analysis and retraining of machine learning models and simulations.
TRI believes that accurately training models requires trillions of miles of testing. With over 100 million Toyotas on the road today, drivers experience a range of driving conditions. To complement their vehicle testing, TRI uses simulations to model a variety of rare conditions and scenarios. These simulations generate photo-real data streams that test how their machine learning models react to demanding cases such as rainstorms, snowstorms, and sharp glare at different times of the day and night, with different road surfaces and surroundings.
As new test data becomes available, TRI rapidly explores research ideas and trains their models quickly so they can deploy updated versions on their test cars and rerun tests.
“Using Amazon EC2 P3 instances, we reduced the time to train our models by 75%. This significantly accelerates our research and development velocity as we can quickly incorporate new data and retrain models, explore ideas, increase model accuracy, and introduce new features faster,” says Adrien Gaidon, PhD, Machine Learning Lead, Toyota Research Institute.
The following figure shows the flow of data collection from TRI test vehicles. When cars return from test runs, solid state drives (SSDs) are pulled from the car and put into an ingest machine that uploads the data to a local network attached storage (NAS). Data is then immediately uploaded to an Amazon S3 data lake, while other sites listen to an Amazon SQS queue populated by Amazon S3 events, and data is synchronized across different sites.
Snippets of this uploaded data that contain interesting data points are used to retrain TRI’s deep learning models. This data is also fed into their simulation environment to update all the test scenarios.
Cloud-based deep learning
TRI needed an IT platform that can handle large amounts of data, has the required processing power to train machine learning models quickly, and can scale to meet their requirements. Using AWS, they gained the ability to spin up compute and storage resources on demand and couple them with higher-level management and orchestration services. This provides TRI’s development teams agility to enable a rapid R&D cycle that can run experiments on massive amounts of data.
TRI uses Amazon S3 to store and retrieve any amount of data from anywhere and Amazon SQS to coordinate data transfer to and from remote data collection sites. The core compute capability needed by TRI to accelerate the training of their machine learning models is powered by multiple Amazon EC2 P3 instances which feature NVIDIA’s latest Tesla V100 GPUs. P3 instances are some of the fastest GPU instances available in the cloud. They help accelerate model training times to only a few hours or minutes, enabling data scientists and machine learning engineers to iterate faster, train more models, and build a competitive edge into their applications.
“Using the AWS Cloud and specifically Amazon EC2 P3 instances, we’re able to build a scalable and highly performant applications stack to efficiently handle and process the huge amount of data that we collect,” says Mike Garrison, Technical Lead, Infrastructure Engineering, Toyota Research Institute.
Using deep learning on Amazon EC2 P3 instances, Amazon S3, Amazon SQS, and AWS networking services, TRI built a scalable solution to enable their development teams to make rapid progress and deliver on their grand vision of applying AI to help Toyota produce cars that are safer, and get closer to realizing a future without traffic injuries or fatalities.
About the Author
Geoff Murase is a Senior Product Marketing Manager for AWS EC2 accelerated computing instances, helping customers meet their compute needs by providing access to hardware-based compute accelerators such as Graphics Processing Units (GPUs) or Field Programmable Gate Arrays (FPGAs). In his spare time, he enjoys playing basketball and biking with his family.