Navigating robots on Mars: Results of the AWS JPL Open Source Rover Challenge
Earlier this year, we held the AWS JPL Open Source Rover Challenge, a four-month competition where participants from around the world used deep reinforcement learning to drive digital robot models on a virtual Mars landscape. Participants created autonomous navigation models for the robot and trained them in AWS RoboMaker simulation. The virtual robot used in this competition was based on the NASA-Jet Propulsion Laboratory (JPL) Open Source Rover, a build-it-yourself scaled down version of the six-wheeled rover that NASA uses on Mars.
Over 2,900 people from 85 countries registered for the challenge, and produced some amazing and sometimes comical results. Some rovers went the wrong way, others crashed into boulders, and a small number successfully made it to the checkpoint. The grand prize of $15,000 was awarded to Jordan Gleeson from Australia. Runner-up Balazs Szabo from Hungary received a $5,000 prize.
If you want to hear more about the real experience of driving rovers on Mars, how Jordan built his successful reinforcement learning model for the contest, and how we designed it, join us for a webinar on July 15. I’ll have special guests including the IT Chief Innovation Officer of JPL, 3 real JPL Rover Flight Software Developer/Operators, and of course Jordan, the winner of the challenge. Register here!
About the Competition
Robots are used for difficult and dangerous tasks such as inspecting deep sea cables, collecting samples in extremely high temperatures of active volcanoes and checking gauges while under toxic radiation levels at nuclear power plants. There is no better example of a difficult and dangerous task than exploring another planet like Mars. Gravity on Mars is a mere 3.711 m/s2 as opposed to our comfortable 9.81 m/s2 on Earth. The thin atmosphere (made up of 96% carbon dioxide) provides little protection from asteroids, which can make the terrain particularly treacherous. NASA estimates that a new crater forms on Mars every one to two days. In addition, our two planets are far apart. When they are aligned, it takes a spacecraft roughly 260 days to travel the 33.9 million miles to Mars from Earth. Since the two planets rotate around the sun at different rates, the distance between them changes; resulting in high latency communication between Earth and a robot on Mars, with long black-out periods. This means sending a test robot to Mars for active development is out of the question. So, unless you want to request an interplanetary workplace relocation to live on Mars, it is prohibitively difficult to develop and test your application with physical robots under these conditions. Finally, robots built for space travel are incredibly expensive and require teams of people to work together on a single robot. A coding mistake can lead to costly damage and major delays if shared physical infrastructure is impacted.
Simulation tools like the open source Gazebo simulation in AWS RoboMaker make it possible for developers to collaboratively build and test new robot application features without risking expensive hardware or trying to physically recreate these unsafe conditions. But why would someone want to do this in the first place? NASA’s Mars Exploration Program has four main science goals, which are pretty awesome:
- To determine if life exists or has ever existed on Mars
- To characterize the climate on Mars
- To characterize the geology on Mars
- To prepare for human exploration
To help achieve these goals, the JPL rover must be able to safely and efficiently navigate the unpredictable Martian terrain. After a waypoint destination is received, the rover must be able to successfully move around hazards (like large rocks, cliffs, crevices) while conserving its very limited power supply. This fundamental task of autonomous navigation on Mars was the main objective of the AWS JPL Open Source Rover Challenge.
Deep Reinforcement Learning
Deep Reinforcement Learning, or the combination of Reinforcement Learning (RL) and Deep Learning is a common practice used for autonomous navigation, where local software agents determine the optimal action to take in a given scenario, based on the output of a trained deep learning model. There are no pre-defined routes, maps, or human operators required. A well-trained model will lead the software agent to make better decisions and provide more efficient autonomous navigation of the robot. In this competition, participants trained deep reinforcement learning models in a simulated Martian world using AWS RoboMaker Simulation. Cloud-based simulation such as RoboMaker’s is beneficial for reinforcement learning because you can easily parameterize and automate your training jobs at scale. This enables developers to easily experiment and retrain their models many times to achieve optimal results before requiring expensive physical equipment.
A key component of training a reinforcement learning model is to create a reward function, or a programmatic way of rewarding the agent for making good choices, and to penalize the agent for making poor choices. For the competition, submissions were scored based on the following criteria:
- Power Consumption– The distance traveled will determine how much energy is consumed. Energy is scarce on Mars, the rover must learn to eliminate non-essential actions, therefore the rover must maximize for power efficiency.
- Time – There can be long windows of time when the rover cannot communicate with Earth. If you miss that window, you may not get your data for a long time. Also, the Martian surface can be treacherous. Therefore, the quicker the rover can reach its destination, the better.
- Damage – With a host of precision sensors and onboard science experiments, the rover will never reach its destination if it sustains irrecoverable physical damage. Therefore, the risk of damage is factored into the final score.
Participants crafted their reward functions with this criteria in mind, using multiple onboard sensors provided by the simulated Open Source Rover. The full-sized rovers built by NASA JPL are equipped with an inertial measurement unit (IMU). This device measures changes in orientation, the angular rate and the force applied to the rover. The virtual Open Source Rover used by participants included simulated IMU sensors on each axis (in a 3D space) to determine the rate and force of directional change when navigating over challenging rocks or hills and to detect damage. A sudden spike in the IMU sensor on the Zed axis, meant the rover had incurred catastrophic fall damage and the training episode was ended. Additionally, the Open Source Rover’s internal battery had only enough power for 265 ‘Time Steps’. A time step in RL is one iteration through the phases of making an observation of the current environment, deciding which action to take and then taking that action. After 265 time steps the Open Source Rover would need to recharge it’s onboard batteries, thus ending the episode. The participants had access to a number of additional sensors and telemetry to inform the creation of their reward function. These measurements of distance travelled, distance from checkpoint, orientation to checkpoint, proximity to obstacles (including detection of collisions that would end the episode), a depth camera to generate 3D point clouds as well as a standard camera to use for navigation.
The Winning Submission
The winning submission was able to complete the mission in an efficient 212 time-steps. It also came closest to the checkpoint, which existed as single point on the Martian surface only 1/100 of a meter in size. By passing within a mere 0.69 meters, given that the rover autonomously travelled 44.25 meters to reach the checkpoint, navigation accuracy exceeded 98.5%. The rover incurred a moderate amount of force with the highest IMU measurement of the journey clocking in at 8.95.
In a timely testament to the name of the newest Mars rover “Perseverance”, Jordan Gleeson credited persistence for his winning submission. In the language of Deep Reinforcement Learning, this means the model trained for long periods of time, or many episodes. This allows the RL agent to maximize its cumulative future reward by repeatedly exploring the environment and then exploiting the knowledge it has gained of the environment. This resulted in a larger cumulative reward. In RL it is important to know if an action taken in the present will negatively effect potential rewards in the future. Allowing the simulation to run for long durations is a textbook example of harnessing the power of simulation in RL. If an attempt was made to train a physical robot in this manner in the real world, it would quickly become untenable.
A special thanks to our partners at JPL and AngelHack who helped design, build and host this competition and to all of those who participated. The next step is to try out your AWS cloud-trained reinforcement learning models on a real Open Source Rover!
Please join us for the conversation with the winner, Jordan, and the real JPL team. You can register here. And let us know if you’re working on any reinforcement learning or robot simulation projects and would like some help.