AWS DevOps & Developer Productivity Blog
Category: DevOps
How AWS DevOps Agent uses multi-agent reasoning to find root causes
Confirmation bias is one of the most common reasons incident investigations take longer than they should. An on-call engineer gets alerted, forms a theory based on initial triage and experience, finds one piece of supporting evidence, and stops looking. The actual root cause — buried in a different service, a different signal, a different time […]
Automate root cause analysis across Datadog and Elasticsearch with AWS DevOps Agent
Modern distributed systems route business transactions through dozens of microservices, message queues, and event streams. When a message fails to process or processing exceeds SLA thresholds, troubleshooting requires correlating logs from tools like Elasticsearch, metrics from Datadog, and infrastructure change events in AWS CloudTrail. Correlating these signals manually across heterogeneous backends, each with different query […]
Building an end-to-end agentic SRE using AWS DevOps Agent
Introduction As modern applications evolve into complex ecosystems of serverless functions, microservices, and event-driven architectures, incident response becomes increasingly challenging. DevOps and SRE teams spend hours manually correlating data across observability tools and troubleshooting issues, racing against SLA deadlines. This reactive firefighting drains productivity, degrades reliability, and delays innovation. AWS DevOps Agent provides an opportunity […]
AWS Transform custom: Enterprise Code Modernization with the Learn-Scale-Improve Flywheel
Enterprise modernization has reached an inflection point. You can transform one repository easily. Existing tools, including AWS Transform custom, work well for individual repositories, and the process is understood. But what about 50 repositories? 100? 200? When you need to modernize at enterprise scale, transforming code is only part of the challenge. Coordinating people, capturing […]
Automating Incident Investigation with AWS DevOps Agent and Salesforce MCP Server
This post was co-written with Ross Belmont, Senior Director, Rodrigo Duran, Strategist Director at Salesforce Every minute counts when managing a critical infrastructure incident. Organizations need to quickly identify issues, diagnose root causes, and implement solutions—all while keeping customers informed. AWS DevOps Agent changes this by automating investigation and response, reducing mean time to resolution […]
Securely connect AWS DevOps Agent to private services in your VPCs
AWS DevOps Agent is your always-available operations teammate that resolves and proactively prevents incidents, optimizes application reliability and performance, and handles on-demand SRE tasks across AWS, multicloud, and on-premises environments. It integrates with your existing observability tools to correlate telemetry, code, and deployment data to reduce Mean Time To Repair (MTTR) and drive operational excellence. […]
Leverage Agentic AI for Autonomous Incident Response with AWS DevOps Agent
Introduction Teams running distributed workloads face a persistent operational challenge: when something breaks, the information needed to resolve it is scattered across logs, deployment pipelines, configuration histories, and third-party monitoring tools. A Site Reliability Engineer (SRE) responding to a 2 AM page must manually correlate telemetry from multiple sources, trace dependencies across services, and form […]
Best Practices for Deploying AWS DevOps Agent in Production
Root cause analysis during incidents is one of the most time-consuming and stressful parts of operating cloud applications. Engineers must quickly correlate telemetry data across multiple services, review deployment history, and understand complex application dependencies—all while under pressure to restore service. AWS DevOps Agent changes this paradigm by bringing autonomous investigation capabilities to your operations […]
From AI agent prototype to product: Lessons from building AWS DevOps Agent
At re:Invent 2025, Matt Garman announced AWS DevOps Agent, a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance. As a member of the DevOps Agent team, we’ve focused heavily on making sure that the “incident response” capability of the DevOps Agent generates useful findings and observations. In particular, we’ve been […]
Introducing the AWS Infrastructure as Code MCP Server: AI-Powered CDK and CloudFormation Assistance
Streamline your AWS infrastructure development with AI-powered documentation search, validation, and troubleshooting Introduction Today, we’re excited to introduce the AWS Infrastructure-as-Code (IaC) MCP Server, a new tool that bridges the gap between AI assistants and your AWS infrastructure development workflow. Built on the Model Context Protocol (MCP), this server enables AI assistants like Kiro CLI, […]









