Artificial Intelligence

AI agent-driven browser automation for enterprise workflow management

Enterprise organizations increasingly rely on web-based applications for critical business processes, yet many workflows remain manually intensive, creating operational inefficiencies and compliance risks. Despite significant technology investments, knowledge workers routinely navigate between eight to twelve different web applications during standard workflows, constantly switching contexts and manually transferring information between systems. Data entry and validation tasks consume approximately 25-30% of worker time, while manual processes create compliance bottlenecks and cross-system data consistency challenges that require continuous human verification. Traditional automation approaches have significant limitations. While robotic process automation (RPA) works for structured, rule-based processes, it becomes brittle when applications update and requires ongoing maintenance. API-based integration remains optimal, but many legacy systems lack modern capabilities. Business process management platforms provide orchestration but struggle with complex decision points and direct web interaction. As a result, most enterprises operate with mixed approaches where only 30% of workflow tasks are fully automated, 50% require human oversight, and 20% remain entirely manual.

These challenges manifest across common enterprise workflows. For example, purchase order validation requires intelligent navigation through multiple systems to perform three-way matching between purchase orders (POs), receipts, and invoices while maintaining audit trails. Employee on-boarding demands coordinated access provisioning across identity management, customer relationship management (CRM), enterprise resource planning (ERP), and collaboration platforms with role-based decision-making. Finally, e-commerce order processing must intelligently process orders across multiple retailer websites lacking native API access. Artificial intelligence (AI) agents represent a significant advancement beyond these traditional solutions, offering capabilities that can intelligently navigate complexity, adapt to dynamic environments, and dramatically reduce manual intervention across enterprise workflows.

In this post, we demonstrate how an e-commerce order management platform can automate order processing workflows across multiple retail websites via AI agents like Amazon Nova Act and Strands agent using Amazon Bedrock AgentCore Browser at scale.

E-commerce order automation workflow

This workflow demonstrates how AI agents can intelligently automate complex, multi-step order processing across diverse retailer websites that lack native API integration, combining adaptive browser navigation with human oversight for exception handling.

The following components work together to enable scalable, AI-powered order processing:

  1. ECS Fargate tasks run containerized Python FastAPI backend with React frontend, providing WebSocket connections for real-time order automation. Tasks automatically scale based on demand.
  2. Application integrates with Amazon Bedrock and Amazon Nova Act for AI-powered order automation. AgentCore Browser Tool provides secure, isolated browser environment for web automation. Main Agent orchestrates Nova Act Agent and Strands + Playwright Agent for intelligent browser control.

The e-commerce order automation workflow represents a common enterprise challenge where businesses need to process orders across multiple retailer websites without native API access. This workflow demonstrates the full capabilities of AI-powered browser automation, from initial navigation through complex decision-making to human-in-the-loop intervention. We have a sample agentic e-commerce automation built out which we have open sourced on aws-samples repository on GitHub.

Workflow process

Users of the e-commerce order management system submit customer orders through a web interface or batch CSV upload, including product details (URL, size, color), customer information, and shipping address. The system assigns priority levels and queues orders for processing. When an order starts, Amazon Bedrock AgentCore Browser creates an isolated browser session with Chrome DevTools Protocol (CDP) connectivity. Amazon Bedrock AgentCore Browser provides a secure, cloud-based browser that enables the AI agent (Amazon Nova Act and Strands agent in this case) to interact with websites. It includes security features such as session isolation, built-in observability through live viewing, AWS CloudTrail logging, and session replay capabilities. The system retrieves retailer credentials from AWS Secrets Manager and generates a live view URL using Amazon DCV streaming for real-time monitoring. The following diagram illustrates the order entire workflow process.

Browser automation with form-filling and order submission

Form-filling represents a critical capability where the agent intelligently detects and populates various field types across different retailer checkout layouts. The AI agent visits the product page, handles authentication if needed, and analyzes the page to identify size selectors, color options, and cart buttons. It selects specified options, adds items to cart, and proceeds to checkout, filling shipping information with intelligent field detection across different retailer layouts. If products are out of stock or unavailable, the agent escalates to human review with context about alternatives.

The sample application employs two distinct approaches depending on the automation method. Amazon Nova Act uses visual understanding and DOM structure of the webpage, allowing the Nova Act agent to receive natural language instructions like “fill shipping address” and automatically identify form fields from the screenshot, adapting to different layouts without predefined selectors. In contrast, the Strands + Playwright Model Context Protocol (MCP) combination uses Bedrock models to analyze the page’s Document Object Model (DOM) structure, determine appropriate form field selectors, and then Playwright MCP executes the low-level browser interactions to populate the fields with customer data. Both approaches automatically adapt to diverse retailer checkout interfaces, eliminating the brittleness of traditional selector-based automation.

Human-in-the-loop

When encountering CAPTCHAs or complex challenges, the agent pauses automation and notifies operators via WebSocket. Operators access the live view to see the exact browser state, resolve the issue manually, and trigger resumption. AgentCore Browser allows for human browser takeover and passing control back to the agent. The agent continues from the current state without restarting the entire process.

Observability and scale

Throughout execution, the system captures session recordings stored in S3, screenshots at critical steps, and detailed execution logs with timestamps. Operators monitor progress through a real-time dashboard showing order status, current step, and progress percentage. For high-volume scenarios, batch processing supports parallel execution of multiple orders with configurable workers (1-10), priority-based queuing, and automatic retry logic for transient failures.

Conclusion

AI agent-driven browser automation represents a fundamental shift in how enterprises approach workflow management. By combining intelligent decision-making, adaptive navigation, and human-in-the-loop capabilities, organizations can move beyond the 30-50-20 split of traditional automation toward significantly higher automation rates across complex, multi-system workflows. The e-commerce order automation example demonstrates that AI agents don’t replace traditional RPA—they enable automation of workflows previously considered too dynamic or complex for automation, handling diverse user interfaces, making contextual decisions, and maintaining full compliance and auditability.

As enterprises face mounting pressure to improve operational efficiency while managing legacy systems and complex integrations, AI agents offer a practical path forward. Rather than investing in expensive system overhauls or accepting the inefficiencies of manual processes, organizations can deploy intelligent browser automation that adapts to their existing technology landscape. The result is reduced operational costs, faster processing times, improved compliance, and most importantly, liberation of knowledge workers from repetitive data entry and system navigation tasks—allowing them to focus on higher-value activities that drive business impact.


About the authors

Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI team, where he has led the design and development of several Bedrock AgentCore services from the ground up, including Runtime, Browser, Code Interpreter, and Identity. He previously worked on Amazon SageMaker since its early days, launching AI/ML capabilities now used by thousands of companies worldwide. Earlier in his career, Kosti was a data scientist. Outside of work, he builds personal productivity automations, plays tennis, and enjoys life with his wife and kids.

Veda Raman is a Sr Solutions Architect for Generative AI for Amazon Nova and Agentic AI at AWS. She helps customers design and build Agentic AI solutions using Amazon Nova models and Bedrock AgentCore. She previously worked with customers building ML solutions using Amazon SageMaker and also as a serverless solutions architect at AWS.

Sanghwa Na is a Generative AI Specialist Solutions Architect at Amazon Web Services. Based in San Francisco, he works with customers to design and build generative AI solutions using large language models and foundation models on AWS. He focuses on helping organizations adopt AI technologies that drive real business value.