- Products
- Cloud Operations
- AWS DevOps Agent
AWS DevOps Agent
Drive operational excellence with an autonomous AI agent that resolves and proactively prevents incidents
Why AWS DevOps Agent?
AWS DevOps Agent is your always-available operations teammate that resolves and proactively prevents incidents, optimizes application reliability and performance, and handles on-demand SRE tasks across AWS, multicloud, and on-prem environments. It investigates incidents and identifies operational improvements as an experienced DevOps engineer would: by learning your applications and their relationships, working with your observability tools, runbooks, code repositories, and CI/CD pipelines, and correlating telemetry, code, and deployment data across all environments. Ask questions, get instant contextual answers, and create custom charts and reports that you can save and share with your team.
Benefits
AWS DevOps Agent is your always-on, autonomous on-call engineer. It begins investigating the moment an alert comes in, whether at 2 AM or during peak hours, to quickly restore your application to optimal performance. AWS DevOps Agent autonomously triages incidents 24/7, providing root cause analysis and actions for resolution. It uses its understanding of your application resources and relationships to quickly understand dependencies and interactions. AWS DevOps Agent streamlines incident response by automatically routing observations, findings, and mitigation steps through your preferred communication channels such as Slack, ServiceNow, and PagerDuty.
AWS DevOps Agent analyzes patterns across historical incidents to provide actionable recommendations that strengthen four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience. Recommendations include agent-ready specs to hand implementation off to your coding agent or a colleague to update application or infrastructure code. This drives continuous improvement without needing to manage a backlog.
AWS DevOps Agent enables you to access the untapped insights in your operational data by securely integrating with your workflows and observability tools, runbooks, code repositories, and CI/CD pipelines. AWS DevOps Agent offers built-in integrations with observability tools such as Amazon CloudWatch, Dynatrace, Datadog, Grafana, New Relic, and Splunk, and code repositories and CI/CD pipelines like Azure DevOps, GitHub and GitLab. You can extend AWS DevOps Agent beyond its built-in integrations by securely connecting to your private or remote MCP servers, enabling integrations with additional tools such as your organization's custom tools, specialized platforms, or proprietary ticketing systems.
AWS DevOps Agent leverages its deep understanding of your environment, enabling you to dive deeper into your application environment, beyond just asking questions, to create, save, and share custom charts and reports. Query resource health, investigate incident patterns, track deployments, and explore prevention recommendations, all through a natural language interface. Create, save, and share custom charts and reports that help you track operational metrics and communicate insights with your team.
Customers
United Airlines
"At United Airlines, we transport more than 500,000 passengers daily. We have about 38,000 Dynatrace OneAgents monitoring across a hybrid cloud environment, more than 500 AWS accounts, 20,000 AWS Lambda functions, Amazon ECS microservices, and numerous other services. At this scale, we previously used multiple tools performing the same functions across different domains, which created gaps and black boxes during troubleshooting. The AWS DevOps Agent with Dynatrace completely changes that. Dynatrace swiftly and accurately detects issues, identifies the responsible application layer, and then the agent investigates further and provides precise steps to resolve the problem — all directly fed into Dynatrace. Instead of initiating an incident call at 3 a.m. and switching between tools, we now have the answers ready—a single pane of glass."
Jason Eckhart, Principal Engineer, Reliability and Observability, United Airlines
T-Mobile
"When AWS introduced DevOps Agent, T-Mobile was at the table from day one. As a design partner, we saw how AWS DevOps Agent can significantly improve root cause analysis across production environments. Our real-world feedback directly influenced how the product evolved.
Our infrastructure spans multiple clouds and on- premises environments, with application logs centralized in our on-premises Splunk deployment. AWS DevOps Agent's ability to integrate seamlessly with Splunk and analyze logs across these diverse environments has been impactful as we continue to pilot the solution."
Aravind Manchireddy, SVP, Technology Operations, T-Mobile
Western Governors University
Western Governor's University (WGU), a leading online university serving over 191,000 students, was among the first organizations to deploy Amazon DevOps Agent into production, doing so even ahead of the preview launch at re:Invent. As a large-scale Dynatrace user, WGU leverages the DevOps Agent's native Dynatrace integration, enabling Dynatrace Intelligence to automatically route problem records to the Agent for investigation and return enriched findings directly back into Dynatrace.
During a recent production investigation, WGU’s SRE team used the DevOps Agent to analyze a service disruption scenario, reducing total resolution time from an estimated two hours to just 28 minutes—a 77% improvement in MTTR. The Agent quickly pinpointed the root cause within a Lambda function’s configuration, surfacing critical operational knowledge that had previously existed only in undiscovered internal documentation.
"It was able to provide the smoking gun, identified the Lambda was the cause. The investigation had almost flawless metrics that matched what we saw on the front-end." He added, "Yesterday was a huge victory, if we can continue to accelerate discovery, I can't describe how much of a victory that would be for our organization." With plans to leverage the DevOps Agent Skills feature, WGU is on track to compress investigation time even further.
Angel Marchena, Director of Technical Operations, Western Governors University
Zenchef
Zenchef is a restaurant technology platform that helps restaurants manage reservations, table operations, digital menus, payments, and guest marketing from a single commission-free system. With a focused DevOps team managing several production environments across multiple business units, they faced a real test when an API integration issue affecting a downstream partner surfaced during a company hackathon, with engineers engaged in the event and nothing significant showing up in monitoring to point them in the right direction.
Rather than pulling engineers off the hackathon, the team brought the issue to DevOps Agent. It worked through the problem systematically, ruling out authentication as a contributing factor, shifting investigation focus to ECS deployments, and ultimately tracing the root cause to a code regression in which a new version failed to handle an unrecognized enum value in the database. The full investigation wrapped in 20-30 minutes, roughly a 75% reduction compared to the 1-2 hours it would have taken manually, and the findings were shared directly with the responsible engineer.
“During the hackathon, we had nearly no available bandwidth to investigate - and we didn’t need it. We’re always trying to be a couple moves ahead, and this kind of proactive investigation just isn’t always possible otherwise. DevOps Agent is enabling new ways of understanding how our platforms behave.”
Theo Massard, Platform Engineering Manager, Zenchef
Resources
Use cases
Incident response and resolution
AWS DevOps Agent autonomously triages incidents and guides teams to rapid resolution. AWS DevOps Agent integrates with observability tools, code repositories, and CI/CD pipelines to correlate and analyze telemetry, code, and deployment data, sharing its hypotheses, observations, and findings. Through systematic investigations, AWS DevOps Agent identifies root cause of issues stemming from system changes, input anomalies, resource limits, component failures, and dependency issues across your entire environment.
Automated incident coordination
You can initiate and guide investigations using interactive chat. AWS DevOps Agent acts as a member of your operations team, working directly within your collaboration tools like ServiceNow and Slack to share findings and coordinate response. When needed, create an AWS Support case directly from an investigation, giving AWS Support experts immediate context for faster resolution.
Prevent future operational incidents
AWS DevOps Agent analyzes patterns across historical incidents to provide actionable recommendations that strengthen four key areas: observability, infrastructure optimization, deployment pipeline enhancement, and application resilience.
Accelerate on-demand SRE task handling
Get immediate, contextual answers to operational questions without navigating between consoles. Query resource health, investigate incident patterns, track deployments, and explore recommendations through natural conversation. Beyond Q&A, create, save, and share custom charts and reports such as daily ops health summaries or 4xx error trends. Conversation history is maintained so you can build on earlier queries without losing context.
Next steps
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages