AWS DevOps & Developer Productivity Blog

Accelerate autonomous incident resolutions using the Datadog MCP server and AWS DevOps agent (in preview)

This post was co-written with Omri Sass (Director of Product Management), Cansu Berkem (Director of Product Management), and Mohammad Jama (Product Marketing Manager) from Datadog.

On-call engineers spend hours manually investigating incidents across multiple observability tools, logs, and monitoring systems. This process delays incident resolution and impacts business operations, especially when teams need to correlate data across different monitoring platforms. AWS DevOps Agent (in preview) is a frontier agent that resolves and proactively prevents incidents, continuously improving reliability and performance of applications in AWS, multicloud, and hybrid environments. Frontier agents represent a new class of AI agents that are autonomous, massively scalable, and work for hours or days without constant intervention. AWS DevOps Agent offers built-in integration with Datadog Model Context Protocol (MCP) Server, enabling you to access the untapped insights in your data by connecting directly to Datadog’s monitoring solutions. DevOps Agent maps your application resources and correlates telemetry, code, and deployment data to reduce MTTR (Mean Time To Resolution) and drive operational excellence.

You can use this integration to collect and analyze Datadog logs, metrics, and traces, correlating this data across AWS services. When incidents occur, AWS DevOps Agent identifies issues and provides mitigation plans which engineers can then implement. Engineers can monitor automated investigations through a central dashboard and engage with the agent through interactive chat at any time. Using this integration, engineers are able to reduce mean time to resolution (MTTR) from hours to minutes, while maintaining full visibility into automated actions.

How Datadog MCP and AWS DevOps Agent work together

The integration between Datadog MCP Server and AWS DevOps Agent connects your monitoring data with automated incident response. Datadog MCP Server acts as a central access point for your monitoring data. It securely connects to Datadog through a standardized protocol, allowing AWS DevOps Agent to query logs, metrics, and traces during investigations. The service uses OAuth 2.0 authentication and supports multiple regions to help maintain data sovereignty requirements.

AWS DevOps Agent learns your resources and relationships while correlating data from both AWS services and Datadog. It analyzes Amazon CloudWatch logs and metrics, deployment data, and code alongside Datadog telemetry to build a complete picture of the incident. This combined view helps identify root causes faster than examining each data source separately. Security considerations are built into every interaction. All interactions between AWS DevOps Agent and Datadog MCP Server uses authentication, authorization, encryption, and logging for audit purposes. While the service currently only runs in us-east-1, it can monitor and analyze applications deployed across any AWS Region in customer accounts globally.

Setting up and using AWS DevOps Agent with Datadog

In this section, we will guide you through the steps required to enable Datadog MCP Server in your AWS DevOps Agent account and configure it for incident resolution.

Pre-requisites

For this walkthrough, you should have access to and understanding of the following:

  • An AWS account with permissions to create AWS IAM (Identity and Access Management) roles:
    • Agent Space role – for basic service operations
    • Agent Space web app role – for using the Agent Space web app functionality
    •  (Optional) Secondary source account roles if monitoring multiple AWS accounts. Refer to the DevOps Agent user guide for the details on setting up these roles.
  • A Datadog account
  • Access to Datadog MCP Server (in preview)

Setting up Datadog in the AWS DevOps Agent console

Start the setup in the AWS DevOps Agent console by connecting your Datadog MCP Server. Navigate to Settings, select the Datadog integration panel, and choose “Register.” Enter your Datadog MCP Server details when prompted (you can learn more about requesting access to this server in their documentation). AWS DevOps Agent validates the connection and displays a confirmation message.

This is the configuration in AWS DevOps Agent for Datadog MCP Server Details with three input fields: Server Name (with example 'my-datadog-server'), Endpoint URL (showing 'https://mcp.datadog.com/api/unstable/mcp-server/mcp'), and an optional Description field. The form includes navigation steps at the top and Cancel/Next buttons at the bottom. The interface has a dark theme with blue accents.Figure 1: Setting up Datadog MCP Server in AWS DevOps Agent Console

Create an AWS DevOps Agent Agent Space

Next, create an Agent Space in your primary AWS account. This requires an AWS IAM role that grants AWS DevOps Agent access to your AWS resources. After creating your Agent Space, add Datadog MCP Server as a telemetry source to enable comprehensive incident investigation.

To create your Agent Space, start by accessing the AWS DevOps Agent console in us-east-1. Choose the “Create Agent Space” button and provide a meaningful name and description for your space. After submitting the form, you’ll need to configure the required IAM roles, which can be done through either the automated creation process or manual setup.

This is the configuration for creating an AWS DevOps Agent AgentSpace. The screen shows the option to create a DevOps Agents, with areas to give agent details, resource access, and more. The interface is dark blue theme. Figure 2: Creating a AWS DevOps Agent in Agent Space

Your Agent Space topology can be initialized using either AWS CloudFormation stacks or AWS Tags as starting points to identify your application components. Once the basic setup is complete, you can enhance your Agent Space configuration by adding Secondary source accounts for multi-account monitoring and configuring integrations with services like SIM ticketing system, Pipelines (where GitFarm packages and CloudFormation Stacks are located), Slack, and most importantly for our use case, Telemetry with the Datadog MCP Server.

This is a page that has options for adding telemetry source (datadog) in agent space. Here, there is a pop-up to add source association. The selected source here to add is Datadog. Figure 3: Add additional telemetry sources for AWS DevOps Agent to investigate

From here, we can launch the Agent Space web app to begin the investigation.

Real-World example: Resolving API Gateway errors

Let’s walk through how AWS DevOps Agent and Datadog work together to resolve a production incident. In this scenario, Datadog detects a spike in Amazon API Gateway 5XX errors affecting downstream services.

This is a sample monitor view of sample 5XX errors in Datadog. There is a monitor of Amazon API Gateway pulled up. On the right, there is a monitor showing "Your 5XX Errors" with over 220 errors. Figure 4: Sample API Gateway errors in Datadog

Investigating 5XX errors from API Gateway Incident with the Datadog MCP Server and AWS DevOps Agent

When the alert triggers, AWS DevOps Agent automatically analyzes both Datadog metrics and API Gateway logs. Through the investigation chat interface, an engineer guides AWS DevOps Agent to examine the API Gateway configuration. The agent correlates API Gateway and AWS Lambda execution logs, quickly identifying error patterns.

This is a view in AWS DevOps Agent to allow for investigating an incident with AWS DevOps Agent and Datadog MCPFigure 4: Investigating an incident with AWS DevOps Agent and Datadog MCP

Resolving and prevention

AWS DevOps Agent helps identify potential misconfigurations in the Lambda and Amazon DynamoDB integration and implements immediate fixes. The agent documents all findings and actions in an incident record, backed by telemetry from both Datadog and AWS services. After resolution, AWS DevOps Agent generates a detailed analysis report with specific recommendations to prevent similar incidents. Teams can review and implement these suggestions through the Prevention feature in the AWS DevOps Agent web app.

This view show the investigation summary produced by AWS DevOps Agent. Here, we see the root cause for this sample incident. The root cause head line states that "1. DynamoDB table name misconfiguration - typo in environment variable". There is a longer description explaining this under it. The background for this view is plain white. Figure 5: Investigation summary produced by AWS DevOps Agent

Clean up

When you’re done using the integration, you can clean up your resources by following these steps:

  1. Delete your Agent Space from the AWS DevOps Agent console
  2. Remove the Datadog MCP Server connection from your settings
  3. Delete the IAM roles created for the Agent Space
  4. (Optional) If you created additional source account roles, remove those as well

Conclusion

The integration between Datadog MCP Server and AWS DevOps Agent reduces incident resolution time by automatically correlating data across your monitoring tools. Instead of manually switching between Datadog and AWS dashboards during incidents, teams can now get an AI-powered investigation that identifies root causes and suggests fixes. Early adopters report significant improvements in their incident response. Resolution times drop from hours to minutes, while on-call teams spend less time gathering data. Teams also see more consistent incident responses and improved root cause analysis through comprehensive data correlation. To learn more, check out the AWS DevOps Agent product page.

Datadog is an AWS Specialization Partner and AWS Marketplace Seller that has been building integrations with AWS services for over a decade, amassing a growing catalog of 100+ AWS and 1000+ built-in integrations. This new AWS DevOps Agent and Datadog MCP Server integration builds upon Datadog’s strong track record of AWS partnership success. If you’re not already using Datadog, you can get started with a 14-day free trial via the AWS Marketplace.

Sujatha Kuppuraju

Sujatha Kuppuraju is a Principal Solutions Architect at AWS, specializing in Cloud and, Generative AI Security. She collaborates with software companies’ leadership teams to architect secure, scalable solutions on AWS and guide strategic product development. Leveraging her expertise in cloud architecture and emerging technologies, Sujatha helps organizations optimize offerings, maintain robust security, and bring innovative products to market in an evolving tech landscape.

DhilipVenkatesh Uvarajan

DhilipVenkatesh Uvarajan is as an Enterprise Support Lead TAM within AWS Enterprise Support, specializing in Independent Software Vendors (ISVs) across the United States. In this role, Dhilip provides strategic technical guidance to help customers innovate, optimize their AWS architecture, and ensure the seamless operation of their business-critical applications on the AWS cloud. Beyond his professional endeavors, Dhilip is passionate about AI and Robotics, often exploring innovative projects in his spare time.

Nina Chen

Nina Chen is a Customer Solutions Manager at AWS specializing in leading software companies to leverage the power of the AWS cloud to accelerate their product innovation and growth. With over 4 years of experience working in the strategic Independent Software Vendor (ISV) vertical, Nina enjoys guiding ISV partners through their cloud transformation journeys, helping them optimize their cloud infrastructure, driving product innovation, and delivering exceptional customer experiences.

Omri Sass

Omri Sass is a Director of Product Management at Datadog, where he’s overseen the development and launch of a multitude of products and capabilities including Bits AI SRE and updog.ai. He is a keen advocate for good user experience and doing what’s right by users.

Cansu Berkem

Cansu Berkem is a Director of Product Management at Datadog, overseeing the company’s end-to-end incident response experience, including Incident Management, On-Call, Automations, and Bits AI SRE. Her products help engineers resolve incidents faster through AI-driven workflows, powered by Bits AI SRE as an autonomous incident investigator and supported by integration-rich incident management and paging flows.

Mohammad Jama

Mohammad Jama is a Product Marketing Manager at Datadog. He leads go-to-market for Datadog’s AWS integrations, working closely with product, marketing, and sales to help companies observe and secure their hybrid and AWS environments.