AWS Robotics Blog

Build headless robotic simulations with AWS Batch

Introduction

Headless robotic simulations with AWS Batch allow robot developers to increase their velocity by running their code through thousands of scenarios and iterating before moving onto physical device testing. The real-world environments and situations a robot can find itself in are nearly endless. What’s worse, it is time consuming and costly to deploy and test every scenario on physical robots. AWS Batch is a service that gives robot developers an easy way to run batch robotics simulation at massive scale with custom control of what compute types to use.

Note that AWS Batch is best used for running headless batch simulations at scale. If you are looking for interactive simulations with a GUI, we recommend AWS RoboMaker simulation.

Overview

In this blog, you will create an AWS Batch compute environment and job to run your containerized robot and simulation applications. This blog takes the Amazon CloudWatch robot monitoring sample originally built with AWS RoboMaker and updates it to show how you can accomplish the task with AWS Batch instead. The sample runs a robot navigation test and sends data to Amazon CloudWatch to monitor the robot’s position and speed.

We will go through the following steps:

  1. Prepare robot and simulation containers.
  2. Create a Dockerfile to install docker-compose and the AWS Command Line Interface (AWS CLI).
  3. Build and push the container image to Amazon Elastic Container Registry (Amazon ECR).
  4. Create a docker-compose.yaml file and upload it to Amazon Simple Storage Service (Amazon S3).
  5. Set up permissions for your AWS Batch jobs and robot and simulation applications.
  6. Create an AWS Batch compute environment, job queue, job definition, and job using the AWS Batch Wizard.
  7. View the logs in Amazon CloudWatch.

Prerequisites:

The following are requirements to follow along with this blog:

Prepare your robot and simulation containers

This blog takes the Amazon CloudWatch robot monitoring sample originally built with AWS RoboMaker and updates it to show how you can accomplish the task with AWS Batch instead. This section refers to content in a previous blog and documents the required changes so the containers will work with AWS Batch.

  1. Clone the Amazon CloudWatch robot monitoring sample repository
    Note: In the AWS Robotics sample applications, the code is already structured with ROS workspace directories. Therefore, you don’t need to create a workspace and source code directory. However, for most open-source ROS packages and likely for your code, first create your workspace directory and clone the source code into <workspace>/src.

    git clone https://github.com/aws-robotics/aws-robomaker-sample-application-cloudwatch.git cloudwatchsample && cd cloudwatchsample
  2. Containerize the robot and simulation applications by following the steps described in Preparing ROS application and simulation containers for AWS RoboMaker with some minimal name changes. Start from Build a docker image from a ROS workspace for AWS RoboMaker step 2.
    1. In step 3, place the Dockerfile in the cloudwatchsample directory.
    2. In step 6 and 7 you may choose to rename your applications to cloudwatch-robot-app and cloudwatch-sim-app You may also choose to rename your application tags as batch-cloudwatch-robot-app and batch-cloudwatch-sim-app respectively.
    3. Under step 1 of Publish docker images to Amazon ECR, ensure you use the same robotapp and simapp variables as your application tag names from steps 6 and 7.
  3. Stop when you have pushed your ROS-based robot and simulation docker images to Amazon ECR (just before you get to the section titled: Create and run robot and simulation applications with containers in AWS RoboMaker).

Create a Dockerfile to install docker-compose and the AWS CLI

We will launch both the robot and simulation containers in AWS Batch. This allows you to run both containers at the same time and have them communicate with each other. In order to have this process run at scale, we will have a special Docker container that AWS Batch uses to run the docker-compose file. The Docker container must have the AWS CLI installed to download the docker-compose file from Amazon S3, and also have Docker Compose installed in order to run the docker-compose up command for creating and starting the robot and simulation containers.

  1. Inside the cloudwatchsample folder, create a new folder for storing our AWS Batch Docker files.
    mkdir batch-docker && cd batch-docker
  1. Create a new file named Dockerfile and copy the contents below into the file. This Dockerfile installs Docker Compose and the AWS CLI. This will then allow you to run a command so AWS Batch can copy your docker-compose.yaml object from Amazon S3.
    Dockerfile:

    FROM ubuntu:focal
    ARG DEBIAN_FRONTEND=noninteractive
    # Install prerequisites
    RUN apt-get update && apt-get install -y curl && apt-get install wget
    RUN apt-get update && apt-get -y install awscli
    RUN apt -y install amazon-ecr-credential-helper
    RUN apt-get -y install jq
    RUN mkdir ~/.docker && echo "{\"credsStore\": \"ecr-login\"}" | jq > ~/.docker/config.json
    RUN curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
    RUN chmod +x /usr/local/bin/docker-compose
    RUN ln -s /usr/local/bin/docker-compose /usr/bin/docker-compose

Build and push the container image to Amazon ECR.

We are now going to use the console to create an Amazon ECR repository and then build and push the Docker image to the repository.

  1. Create a repository in Amazon ECR. In the Amazon ECR console, from the navigation menu, choose Repositories, Create repository. Keep your repository private and give it a name such as batch-dockercompose-awscli. Choose, Create repository.
  2. Select your new repository and choose View push commands. This opens a window that provides you the AWS CLI commands to execute to build and push your container image to Amazon ECR.
    Note: If you are running the push commands from a non-linux platform, you will need to update the docker build command to include --platform linux/amd64 at the end.
    The full command will be:
    docker build -t batch-dockercompose-awscli . --platform linux/amd64

Create a docker-compose.yaml file and upload it to Amazon S3

  1. Create a docker-compose.yaml file and copy the contents below into the file. The file will run both the robot and simulation application and pass through the proper permissions.
    version: "3"
    services:
      robot:
        image: <account-number>.dkr.ecr.<region>.amazonaws.com/batch-cloudwatch-robot-app
        network_mode: host
        command: bash -c "sudo apt-get -y upgrade && roslaunch cloudwatch_robot rotate.launch"
        environment:
                - AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
                - TURTLEBOT3_MODEL
      sim:
        image: <account-number>.dkr.ecr.<region>.amazonaws.com/batch-cloudwatch-sim-app
        network_mode: host
        command: roslaunch cloudwatch_simulation bookstore_turtlebot_navigation.launch
        environment:
                - AWS_CONTAINER_CREDENTIALS_RELATIVE_URI
                - TURTLEBOT3_MODEL
    
  2. Update the image entries in the docker-compose.yaml with the appropriate image URI for both the robot and simulation applications from Amazon ECR that you created earlier. You can find the image URIs in the Amazon ECR console next to your repository names. The docker-compose.yaml file sets network_mode to host for both containers so the containers can communicate with each other and provides the appropriate commands to launch the applications. It also sets the environment variables necessary for your applications. Because this application needs AWS permissions (to access Amazon CloudWatch logs and metrics), use the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI variable to pass your AWS permissions to your containers. All your containers will have the same permissions. This application also requires the TURTLEBOT3_MODEL variable to be set, so it is passed in as an environment variable in your docker-compose file as well.
  1. In the Amazon S3 console, choose an existing bucket, or create a new bucket to upload your docker-compose.yaml file to. Select the bucket, choose Upload, drag and drop your docker-compose.yaml file or choose Add files, and navigate to your file. Choose Upload. Once the object is uploaded, choose Close, then select your docker-compose.yaml object, and choose Copy S3 URI to copy your URI for future use.

Set up permissions for your AWS Batch jobs and robot and simulation applications

First, create an Amazon Elastic Container Service (Amazon ECS) Task Execution role.

  1. Navigate to the AWS IAM Console. In the navigation menu, choose Roles, then choose Create role.
  2. On Step 1, select the AWS service Trusted entity type. For Use case, choose Elastic Container Service from the drop down, then select Elastic Container Service Task. Choose Next.
  3. On Step 2, Add Permissions, you grant Amazon ECS agents permission to call AWS APIs on your behalf. AWS has a managed policy already created for this task. For the Permissions policies, search for and then choose the check box to the left of AmazonECSTaskExecutionRolePolicy. Choose Next.
  4. For Role Name, enter ecsTaskExecutionRole, and then choose Create role.

Next, you need to create a job execution role that gives the containers permissions to access the docker-compose.yaml file in Amazon S3, retrieve the docker images from Amazon ECR, and use Amazon CloudWatch logs for the CloudWatch sample application to run.

  1. Navigate to the AWS IAM Console. In the navigation menu, choose Roles, then choose Create role.
  2. Select the AWS service Trusted entity type. For Use case, choose Elastic Container Service from the drop down, then select Elastic Container Service Task. Choose Next.
  3. Choose Create policy to pop open a new tab for IAM policy creation.
  4. On the Create Policy page, choose JSON, and copy paste the following in the text box.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "cloudwatch:PutMetricData",
                "Resource": "*"
            },
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "logs:DescribeLogGroups",
                    "logs:DescribeLogStreams",
                    "logs:CreateLogGroup"
                ],
                "Resource": [
                    "arn:aws:s3:::<bucketname>/docker-compose.yaml",
                    "arn:aws:logs:us-west-2:<accountnumber>:log-group:robomaker_cloudwatch_monitoring_example:log-stream:*"
                ]
            }
        ]
    }
  5. Update the policy to have the proper resources. If you have been following along with these directions, you just need to update the <bucketname> and <accountnumber>. You can find the bucket name by going to the Amazon S3 console and finding the bucket you created above.
  6. Choose Next: Tags, then choose Next: Review.
  7. Name the policy ecspolicy-batch-cloudwatchsample. Choose Create policy.
  8. Close the IAM policy tab to go back to the IAM role creation tab.
  9. For Permissions policies, search for and then choose the check box to the left of AmazonECSTaskExecutionRolePolicy (this is needed so your robot and simulation containers can be pulled from Amazon ECR) and your newly created ecspolicy-batch-cloudwatchsample (Note: You may need to choose the policy refresh button for it to update with your new policy).
  10. Choose Next.
  11. For Role Name, enter ecsJobRole-cloudwatchsample, give it a Description, and then choose Create role.

Create an AWS Batch compute environment, job queue, job definition, and job using the AWS Batch Wizard

Now you are going to set up AWS Batch to execute the Docker containers we created above to run your simulation. To start, navigate to the AWS Batch Console in AWS and choose Wizard in the left menu. This will guide you through creating the required resources and running the job.

Step 1: Create a compute environment.

  1. Give your compute environment a name.
  2. Leave the default Service role as Batch service-linked role.
    Screenshot of console Step 1: Create a compute environment
  3. Set the Instance Configuration to On-demand. This will create Amazon Elastic Compute Cloud (Amazon EC2) instances for you when you run your jobs. You could also select Spot here to save some money with Amazon EC2 Spot Instances.
  4. Leave the Minimum, Maximum, and Desired vCPUs as the default. By leaving the minimum CPU set to zero, AWS Batch will not have any idle EC2 instances. This means the first job execution is slower, but you also don’t incur any costs until you run the job and resources are consumed. Increasing the minimum and desired number of vCPUs warm pool of EC2 instances and helps reduce the time to start jobs, but does consume more resources.
  5. Under Allowed instance types, you can select optimal to best-fit the instance type to your job definition. If you prefer, you can also manually select the instance type you want to use.
  6. Leave the Allocation strategy as the default, BEST_FIT.
    Console screenshot of provisioning model, vCPU, and allocation strategy settings.
  7. The default Networking settings can be left as is.
  8. Choose Next.

Step two: The job queue. This enables you to create job queues tied to different compute environments and helps with prioritization/orchestration. You can leave this as the default and choose Next.

Step three: The job definition. This defines the container image and configuration to use in the job. Here you will need to set the following:

  1. A Name for your job definition.
  2. An Execution Timeout. You can set this to 3600 (1 hour).
  3. For execution role, choose the IAM role you created in the previous section, ecsTaskExecutionRole.
    Console screenshot creating a job definition.
  4. Under Job Configuration, add your container image ECR URI for your container image that installs docker-compose and the AWS CLI. If you were following along it should be named:
    <account-id>.dkr.ecr.<region>.amazonaws.com/batch-dockercompose-awscli
  5. Replace <account-id> and <region> with the appropriate values.
  6. For Command, enter the following command which downloads your docker-compose.yaml file from Amazon S3, sets the TURTLEBOT3_MODEL variable, and then calls docker-compose up to start your robot and simulation containers.
    bash -c 'aws s3 cp <S3 URI> . && export TURTLEBOT3_MODEL=waffle_pi && docker-compose up'
    
  7. Make sure to replace <S3 URI>with your S3 URI, which will look similar to:
    s3://<bucketname>/docker-compose.yaml
    Console screenshot of job configuration setup.
  1. Leave the rest as default for now and choose Next.

 Step four: The job creation

  1. Set the Execution Timeout to 3600 (1 hour). The rest of the wizard page should be pre-filled from your job definition.
  2. Choose Next: Review, and choose Create.

Run an AWS Batch Job

First, we’re going to create the Job Definition.

  1. Navigate to the AWS Batch Console in AWS and open Job definitions from the navigation menu. Select the radio button next to the job definition you just created, then press Create revision.
  2. Leave Step 1: Job definition configuration as-is and choose Next page.
  3. In Step 2: Container configuration, set Job role configuration to ecsJobRole-cloudwatchsample, the role you created earlier to provide the container in your job with permissions to use the AWS APIs and to mount the docker socket from the underlying host. Choose Next page.
  4. In Step 3: Linux and logging settings, set the Linux configuration to User: root and turn Privileged on to give the container elevated privileges on the host container instance.
  5. In the Filesystem configuration settings, expand Additional configuration and update Volumes configuration:
    1. Choose Add volume.
    2. Namedockersock, Source path/var/run/docker.sock
  6. Next, set the Mount points configuration:
    1. Choose Add mount points configuration.
    2. Source volume: dockersock, Container path: /var/run/docker.sock
  7. Choose Next page.
  8. In Step 4: Job definition review, review that you configuration has the proper settings:
    Console screenshot of Linux configuration.
  9. Choose Create job definition.

Now you can execute the job definition you just created.

  1. In the navigation menu, choose Jobs. Choose Submit new job.
  2. Give your job a name and then choose the latest revision of your Job definition. Choose your Job queue that you created with the wizard.
  3. Choose Next page and then Next page
  4. Choose Create job.

Note: It will take a few minutes for your job to be in the STARTING state. This is because we did not set up a warm pool of instances so AWS Batch needs to start up instances to run the job.

View the logs

  1. Once your job is in the Running state, scroll down to Job information and choose the Log stream name.
  2. A new tab will open to Amazon CloudWatch where you can see logs from your AWS Batch lab.
  3. If you are following along with the CloudWatch Monitoring example, you can also see logs coming through from your robot application.
  4. From the Amazon CloudWatch console, choose Log groups and search for a log group named robomaker_cloudwatch_monitoring_example. If you do not see the log group, choose your region from the navigation bar and choose US West (Oregon) us-west-2 (this is the region set in the configuration file of the robot application running inside the container).
  5. Choose the turtlebot3 log stream.

Here you should see logs coming in from your robot application. You can also see the metrics from the robot in Amazon CloudWatch by choosing Metrics, All metrics from the navigation menu. Choose Custom namespaces, robomaker-cloudwatch-monitoring-example, category, robotid. Set the period of the logs to 1 second so you can easily visualize them on the graph.

Console screenshot of CloudWatch metrics.
Console screenshot of CloudWatch graphed metrics/

Congratulations! You have successfully run a headless robotics simulation with AWS Batch.

Clean-up

To remove resources from AWS Batch:

  1. In the AWS Batch console, select Jobsfrom the navigation menu.
  2. Choose your Job queue,then choose Search.
  3. Select any jobs you have created and choose Terminate job.
  4. Select Job definition from the navigation menu.
  5. Select your job definitions, and then choose Deregister, Deregister job definition.
  6. Select Job queues from the navigation menu.
  7. Select your job queue, choose
  8. Select Compute environments from the navigation menu.
  9. Select your compute environment, choose

To remove resources from Amazon ECR:

  1. In the Amazon ECR console, choose Repositories.
  2. Select each of the three repositories and choose Delete. Type delete to confirm deletion.

To remove resources from Amazon S3:

  1. In the Amazon S3 console, choose Buckets.
  2. Select your bucket, and choose Type permanently delete and choose Empty to confirm deletion.
  3. Choose Exit.
  4. Select your bucket again, and choose Delete. Type the bucket name and choose Delete bucket to confirm deletion.

To remove resources from Amazon CloudWatch:

  1. In the Amazon CloudWatch console, choose Logs, Log groups.
  2. Select the log streams used for this blog and choose Delete, Delete. (Note: some of your logs may be in your original region and some will be in us-west-2).

Summary

In this blog, you learned how you can use AWS Batch to run headless robotic simulations at scale. Using AWS Batch for simulations at scale gives you custom control and cost savings for your simulation workloads. We look forward to hearing more about your simulation workloads running at scale.

Erica Goldberger

Erica Goldberger

Erica Goldberger is a Solutions Architect specializing in Robotics at Amazon Web Services (AWS). Prior to being a Solutions Architect, Erica was a Technical Curriculum Developer building training courses for AWS on topics such as containers and IoT. Erica has a Master’s in Robotics from the University of Pennsylvania.