AWS Public Sector Blog

TUM researcher finds new approach to safety-critical systems using parallelized algorithms on AWS

pFaces targets heterogenous hardware configurations (HWCs) combining compute nodes (CNs) of CPUs, GPUs and hardware accelerators (HWAs). A web-based interface helps developers design parallel algorithms and run them on targeted HWCs.

pFaces targets heterogenous hardware configurations (HWCs) combining compute nodes (CNs) of CPUs, GPUs and hardware accelerators (HWAs). A web-based interface helps developers design parallel algorithms and run them on targeted HWCs.

Mahmoud Khaled, a PhD student at Technische Universität München (TUM) and a research assistant at Ludwig Maximilian Universität (LMU) in Munich, researches how to improve safety-critical systems that require large amounts of compute power. Using Amazon Web Services (AWS), Khaled’s research project, pFaces, accelerates parallelized algorithms and controls computational complexity to speed the time to science.

His project findings introduce a new way to design and deploy verified control software for safety-critical systems, such as robotic surgical machines, air traffic control, shipping and warehousing, rail networks, and autonomous vehicles. Khaled uses techniques with mathematical foundations (formal methods in control) to algorithmically generate correct-by-construction control software for safety-critical applications.

For example, in the symbolic control technique, the system under consideration is abstracted as a finite-state model and then a controller is automatically synthesized. The designed controllers are guaranteed to enforce given formal specifications, such as safety or reachability. Techniques that require long computing times because they cannot run in parallel makes the data unusable—in modern applications that require real time operation. Khaled redesigns them as parallelized algorithms using AWS.

How the cloud speeds the time to science

Self-driving cars are one example of how Khaled’s methods can be applied to speed the time to science. The underlying technology, pFaces, is a general acceleration ecosystem that helps design and deploy parallel algorithms, regardless of application type and target hardware.

Khaled says, “pFaces takes requests from users about how tasks can be done in parallel, and it automatically uses all available hardware—CPUs, GPUs, and hardware accelerators—in parallel as efficiently as possible.” The tool can run workloads locally or on the AWS Cloud and has a Web-IDE for developing the parallel algorithms and running them remotely on multiple Amazon EC2 instances.

 

An example deployment to control a platoon of trucks. An version of the parallelized symbolic control approach runs on top of pFaces and receives control requests from trucks asking for optimized navigation and low-level control decisions. It then responds with controllers synthesized to serve the requests and ensure the safety of the platoon.

An example deployment to control a platoon of trucks. A version of the parallelized symbolic control approach runs on top of pFaces and receives control requests from trucks asking for optimized navigation and low-level control decisions. It then responds with controllers synthesized to serve the requests and ensure the safety of the platoon.

Khaled says, “We started with AWS as a testing platform. However, developing the techniques locally and then deploying them in the cloud opened our minds to other possibilities. In a future when safety-critical systems have reliable and fast connections to the cloud, the control software may be deployed remotely to provide the control as a service. They may also benefit from a ‘collective mind,’ in the cloud, who orchestrates decisions to help not only one system, but a group of systems, in favor of an optimized cooperative operation.”

Making self-driving cars safer

Khaled wanted to create a reliable, safe way for autonomous vehicles and other safety-critical systems to make correct decisions in real time. In the case of autonomous vehicles, he focuses on layers of Autopilot software starting from path planning to low-level control. To try millions of scenarios and make a decision on the safest possible maneuver with formal methods in control techniques, large bursts of computing power are needed. Parallelizing formal methods of control allows the autonomous vehicle to find, the necessary maneuver to avoid the crash, in real time.

Khaled says, “Given an accurate model of the vehicle, we can discover all possible scenarios and pick the best one that will avoid the crash and still deliver a pleasant driving experience. This is computationally complex, so we redesigned and parallelized these techniques. To do this, we turned to AWS. The new techniques were developed and tested in AWS and can be deployed in vehicles to run on their modern hardware like many-core GPUs.”

The parallelized algorithms are compatible with modern high-performance computing (HPC) platforms. In one of his simulations, Khaled used Amazon Elastic Compute Cloud (Amazon EC2) to reduce computation time from 52 seconds within a dual-core CPU to 40 milliseconds in p3.16xlarge EC2 instance, a 1,300-times speedup.

pFaces is available for download on GitHub for any developers or researchers interested in experimenting with the tool.

Listen to Fix This podcast episodes to hear how other organizations like Fred Hutch and Emory University use AWS to speed the time to science. Episodes are available on Apple PodcastsGoogle PlaySpotifyStitcherTuneInOvercastiHeartRadio, and via RSS.

Read more stories from universities around the globe about how they use AWS to further their research, enrich their campuses, and more including stories from The University of Manchester, The University of Nottingham, and University of British Columbia.