AWS DevOps & Developer Productivity Blog
Category: DevOps
Building an end-to-end agentic SRE using AWS DevOps Agent
Introduction As modern applications evolve into complex ecosystems of serverless functions, microservices, and event-driven architectures, incident response becomes increasingly challenging. DevOps and SRE teams spend hours manually correlating data across observability tools and troubleshooting issues, racing against SLA deadlines. This reactive firefighting drains productivity, degrades reliability, and delays innovation. AWS DevOps Agent provides an opportunity […]
AWS Transform custom: Enterprise Code Modernization with the Learn-Scale-Improve Flywheel
Enterprise modernization has reached an inflection point. You can transform one repository easily. Existing tools, including AWS Transform custom, work well for individual repositories, and the process is understood. But what about 50 repositories? 100? 200? When you need to modernize at enterprise scale, transforming code is only part of the challenge. Coordinating people, capturing […]
Automating Incident Investigation with AWS DevOps Agent and Salesforce MCP Server
This post was co-written with Ross Belmont, Senior Director, Rodrigo Duran, Strategist Director at Salesforce Every minute counts when managing a critical infrastructure incident. Organizations need to quickly identify issues, diagnose root causes, and implement solutions—all while keeping customers informed. AWS DevOps Agent changes this by automating investigation and response, reducing mean time to resolution […]
Securely connect AWS DevOps Agent to private services in your VPCs
AWS DevOps Agent is your always-available operations teammate that resolves and proactively prevents incidents, optimizes application reliability and performance, and handles on-demand SRE tasks across AWS, multicloud, and on-premises environments. It integrates with your existing observability tools to correlate telemetry, code, and deployment data to reduce Mean Time To Repair (MTTR) and drive operational excellence. […]
Leverage Agentic AI for Autonomous Incident Response with AWS DevOps Agent
Introduction Teams running distributed workloads face a persistent operational challenge: when something breaks, the information needed to resolve it is scattered across logs, deployment pipelines, configuration histories, and third-party monitoring tools. A Site Reliability Engineer (SRE) responding to a 2 AM page must manually correlate telemetry from multiple sources, trace dependencies across services, and form […]
Best Practices for Deploying AWS DevOps Agent in Production
Root cause analysis during incidents is one of the most time-consuming and stressful parts of operating cloud applications. Engineers must quickly correlate telemetry data across multiple services, review deployment history, and understand complex application dependencies—all while under pressure to restore service. AWS DevOps Agent changes this paradigm by bringing autonomous investigation capabilities to your operations […]
From AI agent prototype to product: Lessons from building AWS DevOps Agent
At re:Invent 2025, Matt Garman announced AWS DevOps Agent, a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance. As a member of the DevOps Agent team, we’ve focused heavily on making sure that the “incident response” capability of the DevOps Agent generates useful findings and observations. In particular, we’ve been […]
Introducing the AWS Infrastructure as Code MCP Server: AI-Powered CDK and CloudFormation Assistance
Streamline your AWS infrastructure development with AI-powered documentation search, validation, and troubleshooting Introduction Today, we’re excited to introduce the AWS Infrastructure-as-Code (IaC) MCP Server, a new tool that bridges the gap between AI assistants and your AWS infrastructure development workflow. Built on the Model Context Protocol (MCP), this server enables AI assistants like Kiro CLI, […]
Safely Handle Configuration Drift with CloudFormation Drift-Aware Change Sets
Introduction Is configuration drift preventing you from accessing the speed, safety, and governance benefits of AWS CloudFormation for infrastructure management? Configuration drift occurs when cloud resources are modified outside of CloudFormation, leading to a mismatch in the actual state and template definition of resources. Drift tends to accumulate from infrastructure changes that engineers make via […]
StackSets Deployment Strategies: Balancing Speed, Safety, and Scale to Optimize Deployments for Different Organizational Needs
AWS CloudFormation StackSets enables organizations to deploy infrastructure consistently across multiple AWS accounts and regions. However, success depends on choosing the right deployment strategy that balances three critical factors: deployment speed, operational safety, and organizational scale. This guide explores proven StackSets deployment strategies specifically designed for multi-account infrastructure management. Understanding StackSets Deployment Fundamentals What are […]









