How Slalom Uses AWS DeepRacer to Upskill its Workforce in Reinforcement Learning

By Sean Dougherty, Shrey Patel, Sean Hussain, and Steve Luu, Technology Leads at Slalom
By Ryan Escalona, Contributing Author
By Lara Wagner, Contributing Author

AWS DeepRacer allows developers of all skill levels to get started with reinforcement learning (RL).

Reinforcement learning is an advanced machine learning (ML) technique that takes a very different approach to training models than other ML methods. It learns very complex behaviors without requiring any labeled training data, and can make short-term decisions while optimizing for a longer term goal.

At Slalom, an AWS Partner Network (APN) Premier Consulting Partner, our enthusiasm and experience for AWS DeepRacer has grown exponentially since we first participated in the AWS DeepRacer activities at AWS re:Invent 2018.

While at re:Invent, Slalom developers attended workshops and sessions to further deep-dive on training and evaluating reinforcement learning models.

Using these insights, Slalom deployed RL models onto AWS DeepRacer cars and got on the track to compete in the AWS DeepRacer League.

After experiencing the thrill of racing at AWS re:Invent, Slalom wanted to create AWS DeepRacer experiences for its own workforce. The cars and tracks now regularly appear in our 30 locations across the United States, United Kingdom, and Canada as valuable internal learning events.

This post shares lessons learned from Slalom’s internal AWS DeepRacer events, as we upskilled our workforce in reinforcement learning. We will focus on the specific details your team and organization may need to successfully organize and run your own AWS DeepRacer events end-to-end.

To get you started, we’ll provide successful model code of the reward functions, configured action spaces, and hyperparameters. These are the same ones Slalom developers used as a baseline as they trained models in search of the fastest lap times.

Driving Excitement for Reinforcement Learning

Excited about AWS DeepRacer’s potential, Slalom began delivering regular learning events across its offices to drive interest in reinforcement learning.

The presentations covered:

AWS DeepRacer origins.
Reinforcement learning in the broader context of artificial intelligence (AI).
AWS DeepRacer simulator and software architecture.

Slalom worked closely with local APN and machine learning specialists to help provide further expertise on specific topics, such as:

Reinforcement learning algorithm families.
AWS DeepRacer neural network architecture.
Hyperparameter tuning.

Making the investment in the track and cars allows developers within Slalom to test their code before participating in local competitions. This is particularly helpful because running multiple simulations on the virtual track may not have the same results in the real world.

Figure 1 – AWS DeepRacer model winds around the track at a recent Slalom event.

Additionally, the tracks have become both formal and informal learning opportunities, as well as opportunities to share in company culture.

Many Slalom offices regularly set up display areas with the AWS DeepRacer vehicles in their spaces, giving everyone from ML enthusiasts to curious onlookers an opportunity to engage with the AWS DeepRacer technology.

For racing teams, it’s invaluable to have our colleagues physically inspect the models and share feedback to enhance performance. For colleagues who aren’t well-versed in reinforcement learning, it’s an opportunity to get hands-on with a complex topic in a tangible way.

How to Rev Up for an AWS DeepRacer Day

While building a physical AWS DeepRacer track is possible, Slalom decided to purchase the track with the following specifications:

Track template: AWS re:Invent 2018
Surface material: 13-ounce Scrim Vinyl
Total dimensions: 26 ft long by 17 ft wide (two pieces at 26 ft x 8.5 ft)
Barriers: Baby blue foam boards 30 inches high with stand (purchased at hardware and office supply stores)

Since the track is in two pieces, it’s important to properly align the track pieces and ensure a smooth and wrinkle-free surface.

For the best results, use double-sided tape on concrete or hardwood flooring. Be sure to grab 3-5 coworkers to help with track setup and taping it to the floor. Work carefully to avoid and smooth any air pockets trapped under the track.

Figure 2 – Charging models at a recent event.

Lighting both above and around the track area plays a critical role in the success of the event. If the lighting distorts the AWS DeepRacer camera’s view of the track, it will not perform as expected. For example, sunlight coming through office windows creates glares on the track that can confuse the AWS DeepRacer.

While there was no silver bullet for creating optimal lighting, the key for us was to adjust lights and window coverings to minimize any possible interference for the AWS DeepRacer’s camera view.

Limit the race to only use at most two cars that can run the same model successfully around the track. Keep an additional car as backup if there’s a failure in one of the two cars. Using only two cars allows for minimizing race time variations, and gives each racer a consistent race experience.

Charge multiple batteries and switch the cars and batteries out every four races. This allows for the motor to cool and keeps the cars running optimally. Designate one person to keep track of when to replace car batteries and switch race cars. Most importantly, load all race models on both cars prior to the race event.

Pro Tip: Use a compressed air can to spray down the motor in between races to help keep the motor cool.

Figure 3 – Slalom teams prepares the track before a recent internal learning event.

Before the day of the event, the coordinators configure the cars to connect to the desired Wi-Fi network and had a demo model loaded on a USB stick. Setting up a dedicated Wi-Fi network just for the cars helps to improve the driving experience by isolating the AWS DeepRacer traffic.

Slalom had its Innovation Lab AWS account, which all participants used to train the models. By default, there is a soft limit of how many models can be trained concurrently. Slalom had to increase the limit on the service so all participants could train their models virtually before taking it to the physical track.

Rolling with AWS DeepRacer

Slalom has held multiple AWS DeepRacer events in different offices in Chicago, Seattle, Houston, and more. Depending on the venue, it takes about an hour to set up the track with five people. We strongly recommend doing a test run in any new space before your event day.

One of the benefits Slalom discovered when hosting these events is that we attracted people with technical and non-technical backgrounds. AWS DeepRacer makes concrete what might otherwise be a difficult concept to understand and embrace (and does so in a fun, gamified way).

Additionally, people with non-technical backgrounds can learn more about AWS and its services.

With more and more interest growing around AWS DeepRacer, different offices have participated in competing in local AWS Summits and virtual circuits to prove their model is the fastest. Multiple Slalom individuals have secured spots on the AWS DeepRacer Leaderboard.

Figure 4 – Slalom consultantS prepare a car for track time.

Tune, Optimize, and Iterate

Out of the box, AWS supplies a few examples of reward functions to help you get started. You can see in the examples below how they use information about the current state of the system to determine a reward value based on how successful the vehicle is at the various approaches (for example, following the center line).

You are highly encouraged to enhance them or create your own from scratch to get the best possible performance.

The local Slalom Houston Technology Enablement team modified the standard out-of-the-box version and clocked in a 10.14 second run time during a client race event.

Figure 5 – Slalom consultants lead a recent learning event in Chicago.

Using the center line function as a starting place, the reward function was focused on staying on track close to the center, while limiting steering extremes. As the car makes its way around the track, it’s rewarded for keeping a faster pace and will lose its reward if the car leaves the track.

Although, not a full-speed focused model, it was successfully able to make it around the track without intervention and could theoretically keep going around until it ran out of batteries.

With some tuning, the success of this model could be improved to drive higher rewards for faster speeds and tested in the AWS training simulator.

Slalom Tuned

Begin the function and set up the function parameters to be used within the code.

def reward_function(params):
    '''
    Example of rewarding the agent to follow center line
    '''
    # Read input parameters
    distance_from_center = params['distance_from_center']
    track_width = params['track_width']
    steering = abs(params['steering_angle']) # Only need the absolute steering angle
    all_wheels_on_track = params['all_wheels_on_track']
    speed = params['speed']
    progress = params['progress']
    steps = params['steps']
    ##############

The largest incentive for the model is for completing the track.

    if progress == 100:
        reward += 100
     ##############

Define center markers of the track, and then reward the car with higher points for staying closer to the center. So, as the car makes its way around the track, it will try to stay as center as it can.

    # Calculate 3 markers that are at varying distances away from the center line
    marker_1 = 0.1 * track_width
    marker_2 = 0.25 * track_width
    marker_3 = 0.5 * track_width
    
    ###############
    # Give higher reward if the agent is closer to center line and vice versa
    if distance_from_center <= marker_1:
        reward = 1
    elif distance_from_center <= marker_2:
        reward = 0.5
    elif distance_from_center <= marker_3:
        reward = 0.1
    else:
        reward = 1e-3  # likely crashed/ close to off track
    ###############

This block gives the racer additional rewards for staying within the track, and cuts the existing reward if the car goes off the track.

# Calculate the distance from each border
    distance_from_border = 0.5 * track_width - distance_from_center
    
    # Reward higher if the car stays inside the track borders
    if distance_from_border >= 0.05:
        reward *= 1.0
    else:
        reward = 1e-3 #Low reward if too close to the border or goes off the track
    ###############

This block gives a reward point for keeping the steering from turning either direction too abruptly. If the steering is smooth and the car is traveling at a high rate while on track, the reward is increased exponentially.

# Steering penality threshold, change the number based on your action space setting
    ABS_STEERING_THRESHOLD = 15
        
    # Penalize reward if the agent is steering too much
    if steering > ABS_STEERING_THRESHOLD and speed > 4:
        reward *= 0.8
    
    if steering < 5 and speed > 4 and all_wheels_on_track:
        reward *=1
    ###############

Finally, if several specific conditions are met while the car is moving around the track at a high rate of speed, all the while staying close to the center, the reward function is increased exponentially.

    #Emphasize staying on track, while moving quickly and staying center aligned
    if all_wheels_on_track and speed > 6 and (distance_from_center <= marker_1 or distance_from_center <= marker_2) and progress > 0:
        reward *=5
    ###############
    return float(reward)

The action space and hyperparameters adjustments provide more granular adjustments to the car’s steering and speed, as well as how the model trains within the virtual training environment.

Learn more in the AWS DeepRacer documentation about the action space and reward function.

Dive deeper into systematically tuning hyperparameters in the AWS DeepRacer documentation.

See You on the Track

When most people hear words like machine learning, reinforcement learning, and artificial intelligence they say something like, “Wow, that’s some complex stuff!”

Hosting AWS DeepRacer events is an interactive and enjoyable way to demystify some of these harder concepts. We hope you can take the information from this post and successfully run your own AWS DeepRacer event. At a minimum, we hope you’ll train a model and compete in the AWS DeepRacer League.

In the meantime, Slalom will continue to upskill its teams and our clients through these kinds of learning events. If you’re curious to hear more about how we train our new models, prepare for competition, and share our enthusiasm for AWS DeepRacer with our clients, please reach out.

We’ll see you on the track!

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.

.

.

Slalom – APN Partner Spotlight

Slalom is an AWS Premier Consulting Partner. A modern consulting firm focused on strategy, technology, and business transformation, Slalom’s teams are backed by regional innovation hubs, a global culture of collaboration, and partnerships with the world’s top technology providers.

Contact Slalom | Practice Overview

*Already worked with Slalom? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.