How to expansively train Robot Learning by Customers on AWS using functions generated by Large Language Models

In the robotics industry, Reinforcement Learning (RL) is extensively utilized to address complex problems that traditional path planning algorithms cannot manage, especially those involving intricate manipulations. The reward function in RL is a critical component that establishes the objective and directs the agent’s learning process. Crafting an effective reward function is often challenging but essential for the RL agent’s success.

Eureka is an impressive project from Nvidia researchers to create sophisticated reward functions by using Large Language Models (LLM) instead of manually crafting. As in the Reinforcement Learning industries, research remains a trail-and-error process to create reward functions. Robotic engineers are seeking an automated way to generate and improve reward functions. Eureka, which is a break through in this subject, provides a kind of automation to help solve tough problems. According to the research paper from Eureka, the rewards generated by LLMs outperform expert crafted reward functions on more than 80% of the tasks with 50% overall performance improvements.

In this blog post, we will describe how to run Nvidia’s Eureka on Amazon Web Services (AWS) and use Amazon Bedrock for LLMs. There are several challenges that need to be addressed by engineers when migrating Robot Learning processes from on-premise training instances to AWS. They are:

Distribute and scale simulation/training processes onto multiple nodes to accelerate training process and save cost.
Visualize robot simulation/training processes in cloud for engineers to engage, improving task efficiency.
Use LLMs from Amazon Bedrock to generate reward functions.

Overview of Solutions

This blog post describes one solution from AWS to help customers in the robotics industry solve the challenges above. This proposed solution will use the following tools:

Autonomous Driving Data Framework (ADDF) is an open-source project designed to provide reusable and modular code artifacts for automotive teams. ADDF provides modules for data processing, signal extraction and scene detection, simulation and data visualization. We will show how this project can be used in the robotics industry.
Amazon Elastic Kubernetes Service (Amazon EKS) is a managed Kubernetes Service in which AWS manages availability and scalability of the Kubernetes control plane nodes responsible for scheduling containers, managing application availability, storing cluster data. This is where the robot training and simulation will happen for this blog post.
NICE DCV is a high-performance remote display protocol that provides customers with a secure way to deliver remote desktops and application streaming. This blog post will be demonstrating using DCV to stream robot simulation and learning processes which will happen in EKS.

This solution will be architected as follows:

Figure 1: High Level Design of the Solution

In this image, we are presenting the architecture of this design. First, you will deploy code and training artifacts to an Amazon S3 bucket. Amazon DataSync will sync the code and artifacts to Amazon FSx, which will be mounted to pods in Amazon EKS Cluster.

On the Amazon EKS side, we are adopting an event-driven architecture for training/simulation. The simulation/training workloads will be running in worker pods. Those worker pods will be stateless. The task controller pods will be responsible for generating training/simulation workloads definitions given a task. We are using Amazon Simple Queue Service (Amazon SQS) to schedule workloads to workers. Each task controller will be controlling the simulation and training for one training job.

Task controllers will also talk to Amazon Bedrock which provides service to sample different reward functions from the chosen LLM. Meanwhile, users could visualize and interact with simulations through DCV modules.

With ADDF, you are able to set up the infrastructure in a simple manner. ADDF will deploy the necessary modules, such as Amazon EKS, DCV components, and other AWS resources specifically for this project.

The prerequisites on your local development environment are:

Python version 3.8 or higher
git CLI
AWS Credentials
AWS CLI
aws-cdk CLI version 2.20.0 or higher
kubectl CLI version 1.29 or higher
Amazon EKS Cluster Setup

In this section, you will focus on setting up the infrastructure. If you have previously used ADDF for other projects, skip the steps as applicable.

Step 1: Clone ADDF repository to local directory. There are currently multiple releases. We suggest using the latest and at the time of this blog post, it is release/3.5.0.

git clone --origin upstream --branch release/3.5.0 https://github.com/awslabs/autonomous-driving-data-framework.git

Step 2: Create a Python Virtual Environment and install the dependencies

cd autonomous-driving-data-framework
python3 -m venv .venv && source .venv/bin/activate
pip3 install -r requirements.txt

Now, ensure you have the valid AWS credentials to your account. You will need this to bootstrap the account. Replace <REGION>, <ACCOUNT_ID> and <ROLE_NAME> in the following commands. <ROLE_NAME> is the name of the role that you are currently using to access the AWS account.

export AWS_DEFAULT_REGION=<REGION>
export PRIMARY_ACCOUNT=<ACCOUNT_ID>
cdk bootstrap aws://<ACCOUNT_ID>/<REGION>
seedfarmer boostrap toolchain \
  --project addf \ 
  -t arn:aws:iam::<ACCOUNT_ID>:role/<ROLE_NAME> \  
  --as-target

Step 3: Deploy Amazon EKS Cluster and DCV modules. In this step, you will be deploying necessary modules for the setup. It will deploy the following:

Amazon FSx for Lustre: This will be mounted to Pods in Amazon EKS. You will also setup Data syncing from S3 bucket to Amazon FSx. To deploy this and data syncing, you need the following module definition. The Amazon FSx definitions are defined in manifests/robotic-training-on-eks/storage.yaml.
- Amazon FSx Storage Class as K8S resources will also be deployed in eureka module. In the dependency modules, a Amazon FSx will be created. You will also deploy additional Amazon FSx K8S resources so that the Amazon FSx could be mounted to our pods.
Amazon ECR: this will be storing our robotic training/simulation images.
Amazon S3 bucket: This bucket will be used as your datastore. You will store training data, training output and code in this bucket.
Amazon EKS: This will be the framework you will be using to schedule training/simulation pods. In this example, you will deploy at least 2 g5.2xlarge instances.
NICE DCV images and Kubernetes (K8S) resources: There will be three modules related to DCV. One will create a ECR which stores DCV server images. Another will be creating DCV images and the last one will create K8S resources (DaemonSet, ClusterRoles, ClusterBindings, and etc.). You can leave these modules as they are. No configuration updates are required. Be sure to check README in dcv-eks module. You will need to setup secrets for the DCV servers.

Now, you can deploy all the ADDF modules by running the following command:

seedfarmer apply manifests/robotic-training-on-eks/deployment.yaml

You will see output similar to the following for successful deployment:

Figure 2: ADDF Modules to be deployed

Step 4: Setup Kubectl and Credentials. In step 3, you have deployed an Amazon EKS cluster. For the scope of this blog post, you will need to use kubectl commands to run applications in Amazon EKS. Please follow this link to install kubectl client with version 2.29.

After the modules have been deployed, all the resources exist as Stacks in Amazon Cloud Formation. Now, you can navigate to Amazon Cloud Formation service in your AWS console. Find the stack with name addf-robotic-training-on-eks-core-eks and select Outputs. You will see the clusterConfigCommand as illustrated in the following figure:

Figure 3: Amazon EKS Cloud Formation Stack Output

Now, copy the values of clusterConfigCommand to your terminal to execute. The credentials will be updated accordingly. You can validate the K8s client by running the following command:

kubectl get svc

The output should look like following:

NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE

kubernetes   ClusterIP   172.20.0.1   <none>        443/TCP   177m

Step 5: Install GPU operator. From Step 3, you should have an Amazon EKS cluster with GPU enabled nodes. ADDF EKS module will spin up instances that have GPU driver pre-installed. However, in order to schedule K8s workflows that require GPU resources, you have to install GPU operator plugin to the cluster. GPU operator will handle all Nvidia GPU resources in K8S automatically. This step will install Nvidia GPU-Operator for any GPU related workloads. Steps 5.1 and 5.3 are optional. If you are looking for these add-on features, you will find the necessary files under this link.

Step 5.1 (Optional) Install DCGM metrics publisher. This is an optional step for GPU operators to publish GPU related metrics, such as hardware utilization:

kubectl create namespace gpu-operator
curl https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/main/etc/dcp-metrics-included.csv > dcgm-metrics.csv
kubectl create configmap metrics-config -n gpu-operator --from-file=dcgm-metrics.csv

Step 5.2 (Required) Install GPU operator. Note, if you skip Step 5.1, you don’t need to set DCGM related flags (bolded in the following command) in the following command:

# run command in bash
bash -c "helm install --wait --generate-name \
    -n gpu-operator \
    nvidia/gpu-operator \
    --set driver.enabled=false \
    --set toolkit.version=v1.14.4-ubi8 \
    --set dcgmExporter.config.name=metrics-config \
    --set dcgmExporter.env[0].name=DCGM_EXPORTER_COLLECTORS \
    --set dcgmExporter.env[0].value=/etc/dcgm-exporter/dcgm-metrics.csv"
# Validate installation by checking out whether all the pods in namespace gpu-operator are Running or Completed
kubectl get pods -o wide -n gpu-operator

Note, the GPU driver installation has been disabled in GPU operator because they have already been installed in the K8S nodegroup AMI, which is deployed by ADDF EKS module.

Now, you should have a ready-to-use Amazon EKS cluster.

Robotic Training and Simulation using Issac Gym Demonstration

Simulation Environment Setup

After the successful deployment of different modules from ADDF repos, and GPU-Operator, you can now start the deployment of simulation/training workloads.

For this blog post, you will be deploying Eureka controller pods and training pods. All the application pods will be manually deployed using kubectl. For each training task, there will be one controller pod. In the scope of this blog post, shadow-hand task will be demonstrated. You can deploy different controller pods if you want to test with different use cases. The training pods are spinning as a K8s deployment. The number of training pods will be limited by the resource limits, such as allocatable GPU counts in each of the node.

By deploying the modules as described in section above, the module eureka will store all the necessary output you need. Specifically, it will output three items:

1. IamRoleArn: This is the role created for users to run simulations/training in the cloud. It contains the following permissions:

a. Amazon FSx read/write

b. Amazon S3 Bucket read/write

c. Amazon SQS read/write

d. Amazon Bedrock permissions.

e. Trust relationship for our Amazon EKS cluster service accounts to assume this role

2. ApplicationImageUri: In our module, you will publish an ECR image contains basic libraries. For this demonstration, you will reuse single image for all usages in this blog posts.

3. SqsUrl: The module will also create an Amazon SQS message queue. Controller will enqueue and workers will dequeue for simulation/training purposes.

Step 1: Upload artifacts to S3. In this step, you will upload the following to S3 bucket you previous created. Note, as you have Amazon Datasync, after you upload the files, the data will be synced from Amazon S3 to Amazon FSx automatically. You can directly mount Amazon FSx to our pods so that all pods could share a common external drive.

Eureka: This is a forked library from original Eureka. This aim to provide scalability of simulations to be deployed across multiple nodes.
IssacGym: Please checkout this link to get access to IssacGym Preview 4. You will get a link to download IssacGym. Replace <LINK_TO_DOWNLOAD_ISSACGYM> to the right link in the following commands.
Fbx Fix Patch: This is a hot patch to fix FBX module error. Adobe no longer supports Fbx python officially. You have to patch this fix to run Fbx.
BUCKET is the data bucket you have created in ADDF modules.

mkdir robotic_data && cd robotic_data

# Get Eureka for distributed simulation
git clone https://github.com/sauronalexander/Eureka.git

# Get IssacGym
wget <LINK_TO_DOWNLOAD_ISSACGYM>
tar -xvf IssacGym_Preview_4_Package.tar.gz
rm IssacGym_Preview_4_Package.tar.gz

# Get Fbx fix patch
git clone https://github.com/Shiiho11/FBX-Python-SDK-for-Python3.x.git
mv FBX-Python-SDK-for-Python3.x/2020.0.1_3.8.3_x64/FbxCommon.py ./
mv FBX-Python-SDK-for-Python3.x/2020.0.1_3.8.3_x64 ./fbx
rm -rf FBX-Python-SDK-for-Python3.x

# Upload files to s3
aws s3 sync ./* s3://<BUCKET> --recursive

From now on, you can edit your local python scripts and sync your code to S3. The code artifacts will be immediately visible to all pods. Another more scalable way is to build code into the image. For simplicity of this blog post, you will not rebuild image everytime you change the code.

Step 2: Setup Python Anaconda Environment. For this step, you will install required python libraries. Installation of dependency packages, such as IssacGym, requires Nvidia drivers to be ready. In this step, you will deploy one pod to each node to setup the environment. For each of this pod, it will establish a local mount, which will use a directory in the hosting node. The pod will checkout the Nvidia GPU driver and preinstall the Anaconda environment to the shared directory. Once this setup is complete, the worker pods and controller pods will reuse the Anaconda environment to run simulation/training workloads. This will save a significant amount of time in environment setup.

Be sure to replace the image URI and role_arn in anaconda_setup.yaml to the right image uri and IAM role (see bolded below) created in previous step.

apiVersion: v1
kind: ServiceAccount
metadata:
  name: aws-s3-access
  annotations:
    eks.amazonaws.com/role-arn: <RoleArn>
automountServiceAccountToken: true
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: anaconda-daemonset
  labels:
    app: anaconda-setup
spec:
  selector:
    matchLabels:
      app: anaconda-setup
  template:
    metadata:
      labels:
        app: anaconda-setup
    spec:
      terminationGracePeriodSeconds: 30 # Set the termination grace period
      serviceAccountName: aws-s3-access
      automountServiceAccountToken: true
      containers:
      - name: anaconda-setup-container
        image: <ApplicationImageUri>
        resources:
          limits:
            nvidia.com/gpu: 1 # requesting 3 GPU, 1 shared gpu is 3GB in g5
…

# Wait for around 10 minutes until the setup is complete. You can verify the setup to be completed by check if the conda setup pods are ready
watch -n 5 kubectl get pods

# When it is ready, you can delete the pods
kubectl delete -f anaconda_setup.yaml

Step 3: Setup LLM Access. In this project, we will demonstrate that you can use LLMs to generate reward functions. With our implementation, you would support two families of LLM: Claude 3 from Anthropic in Amazon Bedrock and ChatGPT from OpenAI.

Step 3.1: Setup Claude 3 access

Navigate to Amazon Bedrock in AWS console (region us-east-1 or us-west-2). Only specific regions provide access to Amazon Bedrock LLMs.
Select Model Access on the left panel
Check the models you want to use. Only Claude 3 models (Claude 3 Haiku, Claude 3 Sonnet, Claude 3 Opus and Claude 3.5 Sonnet) are supported.
Click Save and now you should be able to interact with Claude 3 models using Python.

Step 3.2: Setup ChatGPT access

Get OpenAI API key.
Store this key and you may use it as environment variables when deploying controller pods to Amazon EKS Cluster.

Demonstration – Running Simulation/Training
Now, you can deploy the Eureka applications listed in eureka.yaml. Note, please replace the following as necessary:

Image URI: In previous section, the module you deployed will publish an ubuntu image to one of your Amazon ECR. The image uri will be the output with name <ApplicationImageUri> of module eureka.
EUREKA_TRAINING_QUEUE_URL: Replace with <SqsUrl> you have created in the ADDF modules.
OPENAI_API_KEY: Replace with the key you get from Step 3.2. If you use Claude 3, you can skip this.
AWS_REGION: Replace with the expected region you are deploying, i.e., <REGION>
EUREKA_ENV: The task name for training
EUREKA_SAMPLE: How many samples reward functions to generate from LLM. They will be evaluated in parallel in each iteration.
EUREKA_MAX_ITERATIONS: The maximum of iterations where rewards generated from LLM will be evaluated.
EUREKA_NUM_VAL: The number of final evaluation rounds (20000 iterations) which could be running in parallel.

After you changed the above configurations, you can deploy the pods using the following command. In our example, you will be running shadow-hand task. Our controller pod is named shadow-hand.

kubectl apply -f eureka.yaml

# You can check the progress using
kubectl logs shadow-hand --follow

Note, you can deploy one controller pod per task (defined by EUREKA_ENV), but you can deploy controller multiple tasks in parallel.

Demonstration – Visualizing Robot Simulation/Training using DCV

Figure 3: Application Streaming Architecture in EKS

This section will demonstrate how users could use DCV servers to stream simulation and training. The overall architecture for streaming DCV looks like following:

DCV components will be deployed as a DaemonSet to Amazon EKS Cluster, which means there will be one single DCV pod per node. All application pods, or worker pods in our use case, will be sharing a directory with DCV pod. DCV pod will create a X11 display socket in that directory so that worker pods could render applications in the DCV server in DCV pod through the socket. IssacGym running in the worker pod will be rendered on the X11 socket. Next, the DCV server will accept user connection and display IssacGym in the remote desktop. For public Amazon subnets, you can use NodePort service. For the scope of this blog post, you will use private subnets. The following connection method could be used in both private subnets and public subnets.

You will use K8S port forwarding to connect. You can use the following command:

# First, determine which worker pod you want to connect to by running:
kubectl get pods

# Next, run the following command to forward the traffic
kubectl port-forward $(kubectl get pods -n dcv --field-selector spec.nodeName=$(kubectl get pod <POD_NAME> -o=jsonpath='{.spec.nodeName}') -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}') -n dcv 9999:8443

The above command will find the node which runs the expected worker pod. It will next find the name of the DCV pod which is running on that node. It will forward port 8443 which DCV server is listening to localhost with port 9999.

You can use your browser to navigate to https://localhost:9999. DCV server will show a prompt to input username and credentials.

Figure 4: DCV Login Page in Web Browser

Now prompt in the username/password you have setup in DCV module setup. You will next be able to connect to the DCV server. Simulations should be showing on your screen.

Figure 5: Training shadow-hand in multiple pods

Cleanup

To destroy the application pods, you can run the following command:
kubectl delete -f eureka.yaml

To destroy the modules for this deployment demo, you can run the following command:

seedfarmer destroy manifests/robotic-training-on-eks/deployment.yaml

Conclusion

In this blog post, we described how you can use Amazon services like Amazon EKS, Amazon FSx for Lustre, Amazon ECR, and NICE DCV to help build a scalable robotic training pipeline using Nvidia’s Eureka and Isaac Gym. We also showed how you can set up the infrastructure using ADDF, install GPU operators, create IAM roles, and deploy controller and worker pods. We demonstrated on how LLM like Claude 3 from AWS Bedrock can be integrated in your use case to generate reward functions for reinforcement learning tasks. Finally, we illustrated how you can visualize the robot simulations by streaming them via NICE DCV remote desktops. This solution is designed to enable robotics teams to help accelerate training, using the latest AI models for reward modeling, and collaborate more effectively on AWS. The open source ADDF project is designed to provide reusable infrastructure-as-code for rapidly prototyping such robotic learning pipelines.

For the future work, you are looking for transfer the policies learnt in using this framework to real robots. You believe that this framework could also be adopted to train robots to do sophisticated manipulation work in real world.

Learn more about AWS offerings at the AWS for automotive page, or contact your AWS team today.