AWS Compute Blog

Orchestrating GPU-Accelerated Workloads on Amazon ECS

My colleagues Brandon Chavis, Chad Schmutzer, and Pierre Steckmeyer sent a nice guest post that describes how to run GPU workloads on Amazon ECS.

It’s interesting to note that many workloads on Amazon ECS fit into three primary categories that have obvious synergy with containers:

  • PaaS
  • Batch workloads
  • Long-running services

While these are the most common workloads, we also see ECS used for a wide variety of applications. One new and interesting class of workload is GPU-accelerated workloads or, more specifically, workloads that need to leverage large amounts of GPUs across many nodes.

In this post, we take a look at how ECS enables GPU workloads. For example, at Amazon.com, the Amazon Personalization team runs significant machine learning workloads that leverage many GPUs on ECS.

Amazon ECS overview

ECS is a highly scalable, high performance service for running containers on AWS. ECS provides customers with a state management engine that offloads the heavy lifting of running your own orchestration software, while providing a number of integrations with other services across AWS. For example, you can assign IAM roles to individual tasks, use Auto Scaling to scale your containers in response to load across your services, and use the new Application Load Balancer to distribute load across your application, while automatically checking new tasks into the load balancer when Auto Scaling actions occur.

Today, customers run a wide variety of applications on Amazon ECS, and you can read about some of these use cases here:

Edmunds
Remind
Coursera

ECS and GPU workloads

When you log into Amazon.com, it’s important that you see recommendations for a product you might actually be interested in. The Amazon Personalization team generates personalized product recommendations for Amazon customers. In order to do this, they need to digest very large data sets containing information about the hundreds of millions of products (and just as many customers) that Amazon.com has.

The only way to handle this work in a reasonable amount of time is to ensure it is distributed across a very large number of machines. Amazon Personalization uses machine-learning software that leverages GPUs to train neural networks, but it is challenging to orchestrate this work across a very large number of GPU cores.

To overcome this challenge, Amazon Personalization uses ECS to manage a cluster of Amazon EC2 GPU instances. The team uses P2 instances, which include NVIDIA Tesla K80 GPUs. The cluster of P2 instances functions as a single pool of resources—aggregating CPU, memory, and GPUs—onto which machine learning work can be scheduled.

In order to run this work on an ECS cluster, a Docker image configured with NVIDIA CUDA drivers, which allow the container to communicate with the GPU hardware, is built and stored in Amazon EC2 Container Registry (Amazon ECR).

An ECS task definition is used to point to the container image in ECR and specify configuration for the container at runtime, such as how much CPU and memory each container should use, the command to run inside the container, if a data volume should be mounted in the container, where the source data set lives in Amazon S3, and so on.

After ECS is asked to run a Task, the ECS scheduler finds a suitable place to run the containers by identifying an instance in the cluster with available resources. As shown in the following architecture diagram, ECS can place containers into the cluster of EC2 GPU instances (“GPU slaves” in the diagram):

Give GPUs on ECS a try

To make it easier to try using GPUs on ECS, we’ve built an AWS CloudFormation template to alleviate much of the heavy lifting. This demo architecture is built around DSSTNE, the open source, machine learning library that the Amazon Personalization team uses to actually generate recommendations. Go to the GitHub repository to see the CloudFormation template.

The template spins up an ECS cluster with a single EC2 GPU instance in an Auto Scaling group. You can adjust the desired group capacity to run a larger cluster, if you’d like.

The instance is configured with all of the necessary software that DSSTNE requires for interaction with the underlying GPU hardware, such as NVIDIA drivers. The template also installs some development tools and libraries, like GCC, HDF5, and Open MPI so that you can compile the DSSTNE library at boot time. It then builds a Docker container with the DSSTNE library packaged up and uploads it to ECR. It copies the URL of the resulting container image in ECR and builds an ECS task definition that points to the container.

After the CloudFormation template completes, view the Outputs tab to get an idea of where to look for your new resources.

Conclusion

In this post, we explained how you can use ECS on high GPU workloads, and shared the CloudFormation template that makes it easy to get started with ECS and DSSTNE.

Unfortunately, it would take far too much page space to explain the details of the machine learning specifics in this post, but you can read the Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE post on the AWS Big Data Blog to learn more about how DSSTNE interacts with Apache Spark, trains models, generates predictions, and other fun machine learning concepts.

If you have questions or suggestions, please comment below.