亚马逊AWS官方博客

Accelerating ROS 2 Simulation and Data Collection on AWS with NVIDIA Isaac Sim – Use Case: Lerobot SO-101

The Chinese version of this blog post is published at the same time.

Background

In modern robotics development, traditional workflows heavily rely on physical hardware for data collection and algorithm validation. While effective, this approach incurs high capital and operational costs, introduces risks of hardware wear or damage, and is constrained by environmental and logistical factors. With the rapid development of cloud computing, cloud-native robotic simulation platforms have emerged as a transformative alternative, which enable safe, repeatable, and scalable experimentation without the limitations of physical test environments.

This article details how to leverage AWS’s powerful computing capabilities, combined with the NVIDIA Isaac Sim simulation environment, to achieve remote teleoperation and data collection of the Lerobot SO-101 robotic arm. This cloud-based simulation solution significantly reduces hardware costs, offers scalable compute resources, and supports large-scale parallel simulation experiments — providing an ideal environment for fast iteration and verification of robotics algorithms.

Architecture Design

Workflow Diagram

 

 

Architecture Overview

The architecture builds a cloud-based Isaac Sim environment on AWS EC2, integrating local robotic devices with cloud simulation for seamless remote development and data collection.

Core Components

Local Environment

  • Lerobot SO-101 as the physical robot platform
  • ROS2 data collection node for sensor data acquisition
  • Developer workstation for coding and debugging interface
  • Amazon DCV client for visualized remote access

AWS Cloud Environment

  • EC2 instances running NVIDIA Isaac Sim
  • Amazon DCV Server for high-performance remote desktop access
  • rosbridge-suite for ROS2 communication with local devices
  • Amazon S3 for simulation data, models, and artifacts storage

Data Flow

Real-time Data Synchronization

  1. The local SO-101 device collects sensor data via ROS2 node
  2. Data is transmitted via WebSocket to the cloud rosbridge service
  3. rosbridge bridges the incoming data streams to the Isaac Sim
  4. Simulation results are processed and fed back to the local device in real time

Remote Development Access

  1. Developers connect to the EC2 workstation via the DCV client
  2. Full graphical access to Isaac Sim is provided
  3. Simulation can be debugged and configured interactively in real time

Data Persistence

  1. Training data and model parameters generated during simulation are automatically captured
  2. A dedicated data collection module ensures centralized management and consistency
  3. All artifacts are stored in Amazon S3, enabling backup, version control, and downstream ML workflows

Implementation Steps

Phase 1: Deploy Issac Sim on AWS

1.1 Launch NVIDIA Isaac Sim™ Development Workstation

Marketplace link: NVIDIA Isaac Sim™ Development Workstation

Navigate to: AWS Console → EC2 → Launch Instance → AWS Marketplace AMIs → search “NVIDIA Isaac Sim”.

Choose NVIDIxA Isaac Sim™ Development Workstation (Linux), then configure security groups and launch the instance.

Recommended configuration:

  • Instance type: g6e.8xlarge (GPU-accelerated)
  • Storage: 500GB EBS gp3
  • Open ports: 22 (SSH), 8443 (DCV Remote Desktop), 9090 (WebSocket communication)

Phase 2: Configure ROS2 Data Collection and rosbridge Service

2.1 Setup Local Development Environment

# create conda enviroment for Lerobot
conda create -y -n lerobot python=3.11
conda activate lerobot
conda install ffmpeg -c conda-forge
pip install 'lerobot[all]'

# Configure ROS2 Jazzy
conda config --add channels conda-forge
conda config --add channels robostack-jazzy
conda config --set channel_priority strict
conda install ros-jazzy-desktop -c robostack-jazzy

2.2 Deploy Rosbridge service on AWS

# Setup secure SSH connection to the AWS EC2 intance
ssh -i your-key.pem ubuntu@YOUR_ELASTIC_IP

# Configure DCV remote access credentials
sudo passwd ubuntu

# Configure ROS2 Jazzy
sudo curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key -o /usr/share/keyrings/ros-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list > /dev/null
sudo apt update && sudo apt install -y ros-jazzy-desktop ros-dev-tools
source /opt/ros/jazzy/setup.bash
echo "source /opt/ros/jazzy/setup.bash" >> ~/.bashrc

# Install and launch Rosbridge webSocket service
sudo apt install ros-jazzy-rosbridge-suite ros-jazzy-rosbridge-server ros-jazzy-topic-tools
nohup ros2 launch rosbridge_server rosbridge_websocket_launch.xml port:=9090 2>&1 &

# Verify service status
lsof -i :9090

2.3 Setup local robot device connection

On your local developer workstation, connect the Lerobot SO-101 robotic arm and execute the following commands:

# Activate the Lerobot environment
conda activate lerobot

# Identify the USB port of the Lerobot device
python -m lerobot.find_port

# (Optional) Run local teleoperation
python -m lerobot.teleoperate \
    --robot.type=so101_follower \
    --robot.port=/dev/tty.usbmodem234561 \
    --robot.id=R12345 \
    --teleop.type=so101_leader \
    --teleop.port=/dev/tty.usbmodem123456 \
    --teleop.id=R12345

2.4 Establish the ROS2 Communication Bridge

# Local Control Node (lerobot_so101_ros2_bridge_remote.py)
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import JointState
from geometry_msgs.msg import Twist
import websocket
import json
import serial
import time
import struct
import argparse
import sys

class JointStateReader2Remote(Node):
    def __init__(self, usb_port, rosbridge_ws, device_id=""):
        super().__init__('joint_state_reader2remote')
        
        self.usb_port = usb_port
        self.rosbridge_ws = rosbridge_ws
        self.device_id = device_id
        
        # Configure topic names
        prefix = f"/{device_id}" if device_id else ""
        self.joint_states_topic = f"{prefix}/joint_states"
        self.cmd_pose_topic = f"{prefix}/robot/cmd_pose"
        
        # SO100 joint configuration
        self.joint_names = ['shoulder_pan', 'shoulder_lift', 'elbow_flex', 'wrist_flex', 'wrist_roll', 'gripper']
        
        # Initialize connections
        self.serial_port = None
        self.ws = None
        self.connect_to_robot()
        self.connect_to_websocket()
        
        # State tracking
        self.last_positions = [0.0] * len(self.joint_names)
        self.read_errors = [0] * len(self.joint_names)
        self.total_reads = 0
        self.consecutive_errors = 0
        
        # Start reading timer (20Hz)
        self.timer = self.create_timer(0.05, self.read_and_publish)
        
        self.get_logger().info("SO100 Bridge started")
        self.get_logger().info(f"USB: {self.usb_port}")
        self.get_logger().info(f"WebSocket: {self.rosbridge_ws}")
        self.get_logger().info(f"Topics: {self.joint_states_topic}, {self.cmd_pose_topic}")
        
    def connect_to_robot(self):
        """Connect to SO100 robot via USB"""
        try:
            self.serial_port = serial.Serial(self.usb_port, 1000000, timeout=0.1)
            self.serial_port.reset_input_buffer()
            self.serial_port.reset_output_buffer()
            time.sleep(0.1)
            self.get_logger().info("Connected to SO100 robot")
        except Exception as e:
            self.get_logger().error(f"Failed to connect to robot: {e}")
            self.serial_port = None
    
    def connect_to_websocket(self):
        """Connect to remote WebSocket bridge"""
        try:
            self.ws = websocket.WebSocket()
            self.ws.connect(self.rosbridge_ws)
            
            # Advertise topics
            topics = [
                (self.joint_states_topic, "sensor_msgs/JointState"),
                (self.cmd_pose_topic, "geometry_msgs/Twist")
            ]
            
            for topic, msg_type in topics:
                advertise_msg = {"op": "advertise", "topic": topic, "type": msg_type}
                self.ws.send(json.dumps(advertise_msg))
            
            self.get_logger().info("Connected to WebSocket bridge")
        except Exception as e:
            self.get_logger().error(f"Failed to connect to WebSocket: {e}")
            self.ws = None
    
    def read_servo_position(self, servo_id):
        """Read position from STS3215 servo"""
        if not self.serial_port:
            return None
            
        try:
            # Build command packet
            length = 4
            instruction = 0x02
            address = 0x38
            read_length = 0x02
            checksum = (~(servo_id + length + instruction + address + read_length)) & 0xFF
            cmd = bytes([0xFF, 0xFF, servo_id, length, instruction, address, read_length, checksum])
            
            # Send command and read response
            self.serial_port.reset_input_buffer()
            self.serial_port.write(cmd)
            time.sleep(0.002)
            response = self.serial_port.read(8)
            
            # Parse response
            if len(response) >= 7 and response[0] == 0xFF and response[1] == 0xFF and response[2] == servo_id:
                pos = struct.unpack('<H', response[5:7])[0]
                if 0 <= pos <= 4095:
                    return pos
            return None
                
        except Exception:
            self.read_errors[servo_id - 1] += 1
            return None
    
    def ticks_to_radians(self, ticks, joint_idx):
        """Convert servo ticks to radians"""
        if ticks is None:
            return self.last_positions[joint_idx]
        normalized = (ticks - 2048) / 2048.0
        return normalized * 3.14159
    
    def send_to_websocket(self, topic, msg_data):
        """Send message to WebSocket"""
        if not self.ws:
            return
            
        try:
            data = {"op": "publish", "topic": topic, "msg": msg_data}
            self.ws.send(json.dumps(data))
        except Exception as e:
            self.get_logger().error(f"WebSocket send error: {e}")
            self.ws = None
    
    def read_and_publish(self):
        """Main loop: read joint positions and publish to WebSocket"""
        # Reconnect if needed
        if not self.serial_port:
            self.connect_to_robot()
            return
        if not self.ws:
            self.connect_to_websocket()
            return
            
        self.total_reads += 1
        
        # Read all joint positions
        positions = []
        successful_reads = 0
        
        for i in range(len(self.joint_names)):
            ticks = self.read_servo_position(i + 1)
            radians = self.ticks_to_radians(ticks, i)
            positions.append(radians)
            
            if ticks is not None:
                successful_reads += 1
            time.sleep(0.01)
        
        # Handle connection errors
        if successful_reads == 0:
            self.consecutive_errors += 1
            if self.consecutive_errors > 10:
                self.serial_port = None
                self.consecutive_errors = 0
                return
        else:
            self.consecutive_errors = 0
        
        self.last_positions = positions
        
        # Publish joint states
        current_time = self.get_clock().now().to_msg()
        joint_msg = {
            "header": {
                "stamp": {"sec": current_time.sec, "nanosec": current_time.nanosec},
                "frame_id": ""
            },
            "name": self.joint_names,
            "position": positions,
            "velocity": [],
            "effort": []
        }
        self.send_to_websocket(self.joint_states_topic, joint_msg)
        
        # Publish command pose (empty twist)
        twist_msg = {
            "linear": {"x": 0.0, "y": 0.0, "z": 0.0},
            "angular": {"x": 0.0, "y": 0.0, "z": 0.0}
        }
        self.send_to_websocket(self.cmd_pose_topic, twist_msg)
        
        # Status logging
        if self.total_reads % 100 == 0:
            self.get_logger().info(f"Read #{self.total_reads}: {successful_reads}/{len(self.joint_names)} servos OK")

def main():
    parser = argparse.ArgumentParser(description='SO101 ROS2 Bridge Remote')
    parser.add_argument('--usb_port', default='', help='USB port (e.g., /dev/tty.usbmodemxxx)')
    parser.add_argument('--rosbridge_ws', default='', help='WebSocket URL (e.g., ws://ip:port)')
    parser.add_argument('--device_id', default='', help='Device ID for topic prefix')
    
    args = parser.parse_args()
    
    # Validate required parameters
    if not args.usb_port:
        print("Error: --usb_port is required")
        print("Example: --usb_port /dev/tty.usbmodemxxx")
        sys.exit(1)
    
    if not args.rosbridge_ws:
        print("Error: --rosbridge_ws is required")
        print("Example: --rosbridge_ws ws://ip:port")
        sys.exit(1)
    
    rclpy.init()
    
    try:
        reader = JointStateReader2Remote(args.usb_port, args.rosbridge_ws, args.device_id)
        rclpy.spin(reader)
    except KeyboardInterrupt:
        print("\nShutting down...")
    finally:
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Run the following commands on your local machine to automatically forward USB data from the Lerobot device to ROS2 topics on the remote EC2 instance:

pip3 install websocket-client
python3 lerobot_so101_ros2_bridge_remote.py  --usb_port /dev/tty.usbmodem123456 --rosbridge_ws ws://<ec2_remote_ip>:9090

Then, log in to your AWS EC2 instance to verify the ROS2 topics.

ssh -i your-key.pem ubuntu@YOUR_ELASTIC_IP
ros2 topic list
# Operate the local so101_leader device and observe data updates
ros2 topic echo /joint_states --once

Phase 3: Setup Amazon DCV Visualization Channel

3.1 Connect to Amazon DCV Remote Desktop

  1. On your local computer, install the Amazon DCV client from https://www.amazondcv.com/.
  2. Use the client to connect to the EC2 instance. Configure the connection with the following parameters:
    • Server: <your_ec2_public_ip>:8443
    • Username: ubuntu
    • Password: the Ubuntu password you configured in Step 2

3.2 Initialize the Isaac Sim Environment

Access the remote EC2 instance via the DCV client and run:

cd ~/Documents
git clone https://github.com/TheRobotStudio/SO-ARM100.git
# Verify that the URDF file exists
ls ~/Documents/SO-ARM100/Simulation/SO101/so101_new_calib.urdf

cd /opt/IsaacSim/
./isaac-sim.selector.sh

Follow the steps below inside the Isaac Sim interface:

  1. URDF Model Import: File → Import → Select ~/Documents/SO-ARM100/Simulation/SO101/so101_new_calib.urdf
  2. ROS2 Bridge Extension Activation: Window → Extensions → Search “ROS 2 Bridge” → Enable “isaacsim.ros2.bridge”
  3. Joint State Subscription Configuration: Tools → Robotics → ROS2 OmniGraphs → Joint States

Configuration parameters:

  • Articulation Root: /World/so101_new_calib/root_joint
  • Enable Subscriber option
  • Confirm and apply the configuration
  1. Topic Relay Service Startup: Open a new terminal and run ros2 run topic_tools relay /joint_states /joint_command to convert state messages into command data.
  2. Simulation Execution: Click the Play button on the left panel of the Isaac Sim interface to start the simulation. This enables real-time control of the robot within the remote simulation environment using the local teleoperation device.

Phase 4: Toward an Enterprise-Grade Data Pipeline in an AI-Driven Development Ecosystem

Building on the ROS2–Isaac Sim cloud simulation infrastructure described above, this phase can establish an end-to-end AI-driven robotics development workflow.

Key AWS Service Integrations

  1. Amazon S3 – Data Lake Architecture: Leverage S3’s 11 9s of durability to build tiered storage, with S3 Intelligent-Tiering optimizing costs for managing large-scale robotics datasets.
  2. Amazon FSx for Lustre – High-Performance Storage: Provides sub-millisecond latency with a parallel file system, delivering optimized I/O performance for Vision-Language-Action (VLA) model training.
  3. Amazon SageMaker – ML Operations Platform:
    • Use SageMaker Processing Jobs for large-scale multimodal data preprocessing
    • Scale distributed training with Spot Instances, reducing training costs by up to 90%
    • Leverage Model Registry for enterprise-grade version management and A/B testing
  4. AWS Batch – Large-Scale Parallel Simulation: Use Spot Fleets to provision cost-optimized GPU clusters that support large-scale parallel data generation in Isaac Sim.
  5. Amazon Kinesis – Real-Time Data Streams: Enable low-latency ingestion and processing of sensor data, supporting real-time robotic telemetry and analytics.

Summary

Architectural Value

The dual-entry AWS cloud robotics simulation architecture introduced in this article — combining the rosbridge WebSocket data channel and the Amazon DCV visualization channel — delivers an enterprise-ready development environment for robotics.

Key Advantages

1. Cloud-Native Elasticity

  • On-demand scaling with AWS EC2 to support varying simulation workloads
  • Dynamic GPU provisioning for optimal resource utilization
  • Multi-region deployment to reduce latency globally

2. End-to-End Robotics Development

  • Amazon S3 provides virtually unlimited and scalable data storage
  • Seamless integration with AWS AI/ML services accelerates algorithm iteration
  • Full lifecycle support for data generation, training, and simulation validation in the cloud

Conclusion

The AWS cloud robotics simulation platform represents a major step forward in how robotics development is conducted. Through the dual-entry architecture outlined in this article, developers can:

  • Lower Barriers to Entry: Remove the need for costly hardware investment and broaden access to robotics innovation
  • Accelerate Development: Enable faster cycles with parallel simulation and cloud-based collaboration
  • Ensure Quality: Leverage enterprise-grade infrastructure to maintain stability and reliability
  • Foster Innovation: Drive breakthroughs in intelligent robotics through deep integration with AWS AI/ML services

As cloud computing continues to advance—together with 5G, edge computing, and other emerging technologies—AWS-based robotic simulation platforms are poised to become a core enabler of digital transformation in the robotics industry, providing the foundation for building smarter and more efficient ecosystems.

Authors

Nanxi Li

AWS Solutions Architect with 10+ years of R&D experience from top-tier Chinese internet companies. Deep expertise in IoT, AI/ML, low-code, IDaaS, and zero-trust security. Currently focused on technical consulting and implementation for connected vehicles, autonomous driving, and embodied AI robotics.

Cathy Huang

Associate Solution Architect at AWS, specializing in the design and implementation of cloud solutions for Auto industry. Focused on leveraging cutting-edge AI technologies to drive digital transformation and empower clients with scalable, efficient, and intelligent systems.