AWS Physical AI Blog

Building Physical AI agents with MCP and MQTT on AWS IoT Core

Introduction

A customer walks up to an autonomous barista robot at an airport terminal and orders a flat white coffee. The robot has never been explicitly programmed to make one. It knows how to pull espresso shots, steam milk, and control pour volumes, however “flat white” isn’t in its onboard recipe library. Within 300 milliseconds, the robot queries a cloud AI agent that retrieves the recipe, checks ingredient availability, and pushes a step-by-step execution plan back to the machine. The robot makes a perfect flat white coffee, and the customer never knows an AI agent just reasoned its way through the drink.

The next customer is a loyalty member who always orders oat milk at 140°F. The cloud AI recognizes the customer, adjusts the recipe before the robot even starts, and queues the customized drink without menus, no buttons, no waiting.

This is Physical AI: systems that perceive, reason about, and act in the real world. Building Physical AI requires solving a fundamental integration problem. The robot’s onboard AI knows how to move – grind, tamp, steam, pour. The cloud AI knows what to do – recipes, customer preferences, inventory state, allergen rules. Connecting these two intelligences in real time, reliably, over constrained networks is the hard part.

This blog shows how to solve it using MQTT and the Model Context Protocol (MCP) with AWS IoT Core. This blog walks through the architecture using an autonomous barista robot that handles everything from personalized orders to novel recipes it’s never seen before.

Note: This architecture represents a custom implementation pattern using MCP, an open protocol from Anthropic. Integration patterns are evolving, and performance characteristics vary based on implementation and network conditions.

Understanding the architecture

The barista robot is a constrained edge device   running motor controllers, computer vision, and safety monitors simultaneously. It can’t host a full large language model or maintain persistent HTTP connections to external services. But the cloud AI agent managing it needs real-time access to the robot’s physical state and to systems like the ordering platform, inventory management, and customer profiles.

MQTT over AWS IoT Core solves both sides. The barista robot publishes a lightweight MQTT message to AWS IoT Core, which triggers an AWS Lambda function that translates between MQTT and MCP format, forwarding the request to a cloud AI agent powered by Amazon Bedrock. The agent reasons over the request using MCP tools, then publishes an execution plan back through the same bridge.

The robot receives the plan and executes the physical steps.

The Lambda function is stateless; it makes synchronous HTTP-based MCP tool calls and publishes the complete result back as a single MQTT message, requiring no session affinity. The robot speaks MQTT. The cloud speaks MCP. AWS IoT Core is the nervous system in between.

Figure 1: Physical AI Solution Architecture

What MCP enables: giving the AI physical context

Without MCP, the cloud AI is powerful but physically blind as it can reason about coffee in the abstract but has no way to check whether the robot has oat milk or whether the steam wand is operational. MCP gives the AI structured access to the tools it needs:

What the AI needs to know Where it lives MCP tool
Drinks queued and in what order POS / ordering system get_pending_orders
Recipe for a drink not in local memory Recipe knowledge base lookup_drink_recipe
Is a specific ingredient running low? Inventory weight sensors check_ingredient_levels
Customer preferred milk, temperature, sweetness Loyalty platform get_customer_preferences
Which robot subsystems are operational? Device telemetry via AWS IoT Core get_device_status
Does this order contain allergens? Order metadata + allergen rules check_allergen_flags

The AI agent calls these tools, synthesizes the results, and returns a concrete action plan, not a text suggestion, but a sequence of physical commands: pull ristretto shot at 200°F, steam oat milk to 140°F, pour 4 oz with microfoam texture, dispense into 8 oz cup at station 2.

Topic structure for agent communication

A standardized MQTT topic hierarchy organizes communication across the system:

ai/<scope>/<agent-type>/<agent-id>/<message-type>

Each component serves a specific purpose: scope defines the operational tier (local for on-device, cloud for AWS-hosted agents); agent-type categorizes function (barista, inventory, queue); agent-id uniquely identifies the device; and message-type describes the message nature (telemetry, command, recipe-request). Multi-robot deployments could add an edge scope for a coordination gateway (e.g., AWS IoT Greengrass).

For example,  ai/local/barista/robot-terminal-A3/recipe-request  represents a robot requesting an unknown drink recipe, while ai/cloud/planner/robot-terminal-A3/execution-plan represents the cloud AI pushing an action plan back. This hierarchy enables targeted messaging, wildcard monitoring (ai/local/+/+/telemetry to watch all devices), and scope-based security through AWS IoT Core policies.

Why MQTT is the right transport for Physical AI

Physical AI devices operate under tight constraints: limited compute, intermittent connectivity, and sub-second response requirements, all while controlling motors and processing sensor data. MQTT, designed for machine-to-machine communication, fits naturally.

Minimal overhead. A temperature reading via HTTP typically requires 400–800 bytes, where headers dominate the payload. The same data via MQTT requires roughly 57 bytes based on the MQTT v3.1.1 protocol specification—an 85–90% reduction per message. When the barista robot is publishing telemetry from six subsystems multiple times per second, that overhead difference translates directly into compute headroom for making coffee.

Guaranteed delivery for physical actions. In traditional software, a failed API call means you retry. In Physical AI, a missed command has physical consequences – the robot pours the wrong milk or skips a step. MQTT provides protocol-level delivery guarantees (QoS 1, at-least-once delivery) ensuring actuation commands survive network interruptions, even in a busy airport terminal with congested Wi-Fi. Commands are designed to be idempotent, so redelivery is safe.

Offline resilience. If the internet drops mid-drink, the robot can’t freeze. Its local recipe cache holds the most popular drinks, and safe operating defaults keep the hardware protected. Queued messages are delivered automatically when connectivity is restored. The robot finishes the current drink using onboard intelligence while the system recovers.

Communication flow: the flat white coffee

A customer orders a flat white coffee – a drink the robot hasn’t been explicitly programmed to make. Here’s the full round trip.

The POS system publishes the order to AWS IoT Core with the payload: {"drink": "flat white", "size": "regular", "customer_id": "LYL-90482"}. The robot receives the order, checks its local recipe cache, and finds no match. It publishes a recipe request via MQTT.

AWS IoT Core triggers the Lambda based bridge function, which translates the request into MCP format and forwards it to the cloud AI agent. The agent makes three tool calls:

  • lookup_drink_recipe("flat white") → ristretto base, 4 oz microfoamed whole milk, 1:4 ratio, serve at 150–160°F
  • get_customer_preferences("LYL-90482") → prefers oat milk, 140°F, no sweetener
  • check_ingredient_levels("robot-terminal-A3") → oat milk at 62%, espresso beans at 45%

The standard recipe calls for whole milk at 155°F, but this customer prefers oat milk at 140°F. The AI adapts accordingly and generates a step sequence: grind 18g fine, pull ristretto at 200°F, steam oat milk to 140°F with microfoam, pour 4 oz center, dispense 8 oz cup at station 2. The plan is published back through the bridge with guaranteed delivery.

The robot executes each step, publishing telemetry throughout – grinder RPM, extraction pressure, milk temperature. allowing the cloud AI to monitor progress. On completion, the customer display updates: “Your flat white is ready at station 2.”

Total elapsed time from order to first physical action: under 2 seconds with a warm Lambda and co-located services. The customer got a personalized drink from a recipe the robot had never seen, assembled in real time by the coordination of physical execution and cloud intelligence.

Handling failure gracefully

Physical AI must handle failure without freezing. When the oat milk reservoir’s weight sensor crosses the low threshold, the cloud AI proactively adjusts upcoming orders. For the loyalty customer who always orders oat milk, the AI checks for a secondary preference – say, almond milk. If one exists, it substitutes automatically and updates the customer display: “We’ve used your backup milk preference (almond) today.” If no backup is on file, it prompts the customer to choose. Simultaneously, the inventory agent triggers a restocking alert.

If the internet drops entirely, the robot switches to offline mode, serving drinks from its local recipe cache covering the most popular orders. When connectivity is restored, queued messages sync automatically, keeping the cloud AI’s models accurate.

Extending the pattern

The MQTT-to-MCP bridge pattern applies wherever physical agents need cloud intelligence for real-time decisions. Sidewalk delivery robots can query MCP tools for route alternatives and traffic data when encountering obstacles. Hospital medication robots can verify patient identity and allergy data against EHR systems before dispensing. Manufacturing cobots can check ERP systems for alternative components when a part variation is detected – all through the same architecture of MQTT transport, AWS IoT Core brokering, and MCP-connected cloud reasoning.

Conclusion

Physical AI systems need more than onboard compute – they need access to external knowledge, customer context, and operational intelligence that lives in the cloud. AWS IoT Core bridges this gap. MQTT handles the transport with the low overhead and reliability that physical systems demand. MCP standardizes how cloud AI agents access the tools and data they need to make informed decisions. Together, they form the nervous system of Physical AI.

The robot knows how. The cloud AI knows what. AWS IoT Core connects the two. To get started, explore the links below and start building today!