AWS DevOps & Developer Productivity Blog

Automating Incident Investigation with AWS DevOps Agent and Salesforce MCP Server

This post was co-written with Ross Belmont, Senior Director, Rodrigo Duran, Strategist Director at Salesforce

Every minute counts when managing a critical infrastructure incident. Organizations need to quickly identify issues, diagnose root causes, and implement solutions—all while keeping customers informed. AWS DevOps Agent changes this by automating investigation and response, reducing mean time to resolution (MTTR) from hours to minutes.

In this post, you’ll learn how to integrate AWS DevOps Agent with Salesforce Hosted MCP Server to create an autonomous incident investigation workflow. This integration connects customer support cases directly to infrastructure diagnostics, reducing response times, and facilitating consistent incident resolution across your organization.

The Challenge: The Cost of Manual Incident Investigation

Customer complaints like “the website is slow” often trigger hours of investigation across distributed systems, fragmented telemetry, and multiple teams. Your customer support team lacks the deep infrastructure expertise to diagnose root causes, while your DevOps Engineers are constantly interrupted and pulled away from systematic improvements.

This handoff between teams creates friction:

  • Increased mean time to detect (MTTD) – Issues sit in queues waiting for the right expert
  • Extended mean time to resolve (MTTR) – Manual investigation across Amazon CloudWatch, AWS CloudTrail, application logs, and deployment history is time-consuming
  • Context loss – Information gets lost in translation between support tickets and infrastructure analysis
  • Reactive problem solving – Teams spend time on symptoms rather than preventing recurring issues

This diagram illustrates the traditional, manual incident response workflow before AWS DevOps Agent integration. It shows the friction-filled handoff process where a customer reports an issue to the support team, who then manually escalates it to DevOps engineers. The flow highlights pain points including increased mean time to detect (MTTD), extended mean time to resolve (MTTR), context loss between teams, and reactive problem-solving. The visual emphasizes how information gets lost in translation between support tickets and infrastructure analysis, with issues sitting in queues waiting for the right expert.

Figure 1 – Manual Support Process without DevOps Agent

AWS DevOps Agent integrated with Salesforce changes this paradigm by connecting support workflows directly to autonomous infrastructure investigation, eliminating manual handoffs and reducing investigation time.

How It Works – A Seamless Flow from Customer Complaint to Infrastructure Diagnosis

This diagram depicts the streamlined, automated incident response workflow enabled by AWS DevOps Agent integrated with Salesforce. It shows four key steps: case creation in Agentforce Service, autonomous investigation by DevOps Agent across AWS observability services, case enrichment with technical findings posted back to Salesforce, and preventative recommendations. The flow eliminates manual handoffs between support and engineering teams, connecting customer complaints directly to infrastructure diagnostics for faster resolution and consistent incident handling.

Figure 2 – Automated Support Process with DevOps Agent

  1. Case Creation: Your customer reports an issue in Agentforce Service (e.g., “My Load Balancer is showing unavailable”). Salesforce Flow detects the new case and triggers the AWS DevOps Agent via an API or webhook call.
  2. Autonomous Investigation: DevOps Agent starts an investigation and identifies the root cause. The agent queries AWS observability services, third-party platforms like Splunk and Datadog, code repositories, and CI/CD pipelines. It builds a dynamic topology graph to map relationships between application resources.
  3. Case Enrichment: Investigation findings automatically post back to the Salesforce case, providing your support team with technical context and root cause analysis.
  4. Preventative Recommendations: The agent suggests architectural improvements to help prevent recurrence.

Real-World Example: The Single Instance Outage

The Incident

A customer opens a case in Agentforce Service reporting an application as unavailable.

Salesforce Agentforce Service interface displaying a customer-reported support case. The case details include the customer's complaint about an application being unavailable. This is the starting point of the automated investigation workflow, where a customer opens a case reporting a service disruption. The interface shows standard Salesforce case fields such as account information, incident description, and timing details that will be used by AWS DevOps Agent to initiate its autonomous infrastructure investigation.

Figure 3 – Agentforce Service case details

The Investigation

Salesforce Flow triggers DevOps Agent when the case is created:

  1. Case Retrieval: The agent uses the Salesforce soql_query tool to retrieve case details, including the customer’s account, incident description, and timing. The tool is made available via Salesforce Hosted MCP.

    Salesforce soql_query tool, made available via Salesforce Hosted MCP, to retrieve case details from Salesforce. The query pulls information including the customer's account, incident description, and timing. This is the first step of the autonomous investigation, where the agent programmatically accesses Salesforce CRM data to understand the reported issue before beginning infrastructure diagnostics

    Figure 4 – Salesforce SOQL Query

  2. Topology Discovery: The agent maps the infrastructure and identifies all components of the application.
  3. CloudWatch Metrics Analysis: The agent examines metrics during the incident window and discovers the count of requests dropped to zero during the unavailability period.
    CloudWatch metrics analysis, providing additional detail on the request count data during the incident window. Together with Figure 5a, it forms the complete picture of the traffic pattern showing the drop to zero requests. The agent uses this telemetry data as quantitative evidence to support its root cause analysis, correlating the exact timing of the traffic drop with other events discovered during the investigation, such as CloudTrail administrative actions.

    CloudWatch metrics analysis performed by AWS DevOps Agent during the investigation. It displays a request count chart for the affected application during the incident window. The metrics reveal that the count of incoming requests dropped to zero during the unavailability period reported by the customer. This data point is critical evidence the agent uses to correlate the customer complaint with actual infrastructure behavior, confirming that the application was indeed unreachable during the reported timeframe.

    Figure 5 – Request Count Chart

  4. CloudTrail Event Analysis: The agent discovers a sequence of administrative actions that caused the downtime.

    AWS DevOps Agent analyzing CloudTrail events during the incident timeframe. The agent discovers a sequence of administrative actions that caused the application downtime. CloudTrail provides an audit trail of API calls made in the AWS account, and the agent examines these events to identify what infrastructure changes occurred around the time of the outage. This step is crucial for correlating operational events with the observed metrics drop, helping the agent build a timeline of the incident.

    Figure 6 – CloudTrail Analysis

  5. Root Cause Determination: The agent correlates the administrative actions with the metrics drop, identifying that an EC2 instance termination caused the outage.

    AWS DevOps Agent's root cause determination. The agent correlates the administrative actions found in CloudTrail with the CloudWatch metrics drop, identifying that an EC2 instance termination caused the outage. The findings show the specific sequence of events: an EC2 instance was terminated, which caused the application to become unavailable since it was running on a single instance without redundancy. This demonstrates the agent's ability to autonomously connect multiple data sources to pinpoint the exact cause of an incident.

    Figure 7 – Root Cause Details

  6. Case Update: The agent uses the Salesforce create_sobject_record tool to post findings to the case Activity feed. The tool is made available via Salesforce Hosted MCP.

The Result

Your Salesforce case now contains a comprehensive root cause analysis with timeline, affected resources, and contributing factors.

Salesforce Agentforce Service case after AWS DevOps Agent has posted its investigation findings. The case now contains a comprehensive root cause analysis including a timeline of events, affected resources, and contributing factors. The support team can see the technical diagnosis directly within their familiar Salesforce interface without needing to escalate to engineering. This demonstrates the case enrichment step where investigation results flow back to Salesforce automatically via the Hosted MCP Server.

Figure 8 – Agentforce Service case updated with root cause

The Mitigation Plan

The agent generates an actionable mitigation plan showing how to prevent recurrence.

The actionable mitigation plan generated by AWS DevOps Agent after completing its root cause analysis. The plan outlines specific recommendations for preventing recurrence of the incident, such as implementing redundancy and auto-scaling. Rather than just identifying what went wrong, the agent proactively suggests architectural improvements. This transforms incident response from a purely reactive process into an opportunity for systematic infrastructure improvement, helping organizations build more resilient systems over time.

Figure 9 – Mitigation Plan

The agent also provides step-by-step remediation instructions that you can apply immediately. Due to length, this shows a portion of the plan.

Showing a portion of the step-by-step remediation instructions generated by AWS DevOps Agent. These instructions provide specific, immediately actionable guidance that operations teams can follow to implement the recommended mitigations. The instructions go beyond high-level recommendations to include concrete implementation steps. Due to length, only a portion of the full plan is shown, demonstrating the depth and specificity of the agent's remediation guidance for preventing future occurrences of the identified issue.

Figure 10 – Step by Step Mitigation instructions

Technical Implementation

Prerequisites: Before implementing this integration, verify you have:

  1. Agentforce Service with Salesforce Hosted MCP Server enabled
  2. AWS DevOps Agent Space configured in your AWS account
  3. Amazon CloudWatch and AWS CloudTrail enabled for observability
  4. Infrastructure resources tagged for topology mapping (optional)
  5. Familiarity with Salesforce Flow Builder for workflow automation

This integration requires configuration in both Salesforce and AWS. The following steps provide an overview of the setup process.

  1. Create Agent Space: Set up a DevOps Agent Space in your AWS account with appropriate IAM roles and permissions.
  2. Integrate Observability Tools: Connect your operational tools like Splunk, Datadog, or New Relic to provide the agent with telemetry data.
  3. Connect Code Repositories: Link GitHub, GitLab, or AWS CodeCommit to enable the agent to correlate incidents with recent deployments.
  4. Build Topology Mapping: Tag your infrastructure resources, so the agent focuses on components relevant to your application.
  5. Add Skills: Configure the agent with instructions to direct the investigation – for example, to update Agentforce Service cases when investigations are complete.

Highlighted below are the key setup steps:

Create Agent Space

An Agent Space defines the AWS accounts, integrations, and access controls for your DevOps Agent investigations. When you create your Agent Space, configure a skill that instructs the agent to post investigation findings back to Salesforce cases.

AWS DevOps Agent Space configuration interface where an agent skill is being set up for Salesforce integration. The skill provides specific instructions directing the agent to update the originating Agentforce Service case when an investigation completes. This configuration step is part of the technical implementation, defining the agent's workflow behavior. Skills tell the agent what actions to take, in this case posting investigation findings back to the Salesforce case that triggered the investigation.

Figure 11 – Agent Skill for Salesforce

The skill provides specific instructions for the agent’s workflow – in this case, directing it to update the originating Agentforce Service case when the investigation completes.

Salesforce Hosted MCP Server Setup

The Salesforce Hosted MCP Server enables AWS DevOps Agent to query case data and post investigation findings back to Salesforce. Configure the MCP Server in your Salesforce org using the following steps. For complete instructions, see the Salesforce documentation and the Salesforce Hosted MCP GitHub Repository.

Add the Salesforce Hosted MCP Server to Your Agent Space

In the AWS Console, register the Salesforce MCP Server with your Agent Space. This connection allows DevOps Agent to query Salesforce case data and post investigation findings.

After registration, test by manually triggering an investigation from the AWS Console. Instruct the agent to retrieve case details from Salesforce and post the root cause analysis back to the case.
When configuring MCP tools, follow best security practices.

AWS DevOps agent Operator Console interface for manually triggering a DevOps Agent investigation. It demonstrates how to instruct the agent to retrieve case details from Salesforce and post root cause analysis back to the case. This manual trigger is used during initial setup and testing before automating the workflow with Salesforce Flow. The interface shows the investigation prompt and configuration, including the connection to the Salesforce Hosted MCP Server for bidirectional communication between AWS and Salesforce.

Figure 12 – Starting an Investigation

In the next step, you’ll automate this workflow using Salesforce Flow, so investigations trigger automatically when cases are created.

Using Salesforce Flows

Salesforce Flows automate the connection between case creation and DevOps Agent investigations. Flow is a no-code automation tool that uses a visual drag-and-drop interface (Flow Builder) to automate business processes.

Configure a Flow trigger on your Case object to invoke DevOps Agent automatically when cases are created.

Salesforce Flow trigger configuration on the Case object. The trigger is set to fire automatically when new cases are created, initiating the DevOps Agent investigation workflow without manual intervention. This is the automation component that connects Salesforce case creation to AWS DevOps Agent, replacing the manual process of escalating issues to engineering teams. The Flow Builder's visual interface shows the trigger conditions that determine when the automated investigation should begin.

Figure 13 – Salesforce Trigger

The Flow calls the DevOps Agent webhook with case details including the customer account, incident description, and timing. This triggers an autonomous investigation without requiring manual handoff to engineering teams. Due to length, this shows a portion of the Flow.

Salesforce Flow that automates the connection between case creation and AWS DevOps Agent investigations. The Flow calls the DevOps Agent webhook with case details including customer account, incident description, and timing. Built using Salesforce's no-code Flow Builder with its visual drag-and-drop interface, this automation triggers autonomous infrastructure investigation without requiring manual handoff to engineering teams. Due to length, only a portion of the full Flow is displayed.

Figure 14 – Salesforce Flow

For implementation details and example code, see this Code repository

Connecting Salesforce Flow to AWS DevOps Agent

Configure how Salesforce Flow invokes the DevOps Agent webhook. Choose one of three integration approaches based on your requirements:

  1. Option 1: External Service (Recommended for simplicity)External Service Integrate with AWS services using SigV4 (AWS Signature Version 4) authentication through Named Credentials. This no-code approach is the fastest way to establish the connection.
  2. Option 2: Apex Class (Recommended for custom logic)Create an Apex class that your Flow calls to invoke the webhook. This approach provides flexibility to add custom business logic or error handling before triggering investigations.

Results and Impact

This integration transforms incident response by connecting customer support directly to autonomous infrastructure investigation:

Faster Incident Resolution: Autonomous investigation reduces mean time to resolution (MTTR) by eliminating manual log analysis. The agent detects and diagnoses issues immediately when cases are created, providing 24/7 coverage across time zones.

Reduced Manual Effort: SRE teams focus on systematic improvements instead of responding to individual incidents. Support teams receive technical insights without escalating to engineering, and every investigation follows the same thorough process.

Improved Customer Experience: Customers receive detailed root cause analysis within minutes of reporting an issue. This transparency builds trust, and the agent’s architectural recommendations help prevent recurring problems.

Organizational Learning: Every investigation is documented and searchable, creating a knowledge base of incident patterns. The agent identifies recurring issues across cases and suggests infrastructure improvements to address root causes.

Conclusion

Connecting AWS DevOps Agent with a Salesforce Hosted MCP Server creates an autonomous investigation workflow that eliminates manual handoffs between support and engineering teams. This integration reduces mean time to resolution through instant analysis, improves customer experience with rapid root cause updates, and enables proactive prevention through pattern recognition.

About the Authors

This blog post was authored by:

Conor Manton author photo

Conor Manton

Conor Manton is a Principal Technical Account Manager at AWS, based in San Francisco. He works with strategic enterprise customers to accelerate their cloud journey, with a focus to operationalize AI-powered workflows to drive business outcomes.

Ross Belmont author photo

Ross Belmont

Ross Belmont is a Senior Director of Product Management focused on integrations, with more than 15 years of experience in the Salesforce ecosystem.

Rohit Sharma author photo

Rohit Sharma

Rohit Sharma is a Senior Technical Account Manager at AWS Enterprise Support, based in New York. He partners with strategic enterprise customers to optimize their cloud operations, leveraging AI and automation to reduce operational overhead and improve incident response at scale.

Rodrigo Duran author photo

Rodrigo Duran

Rodrigo Duran is an AI Deployment Strategist Director at Salesforce. Based in São José dos Campos, São Paulo, Brazil. He partners with strategic customers globally to bridge the gap between technical strategy and business impact, scaling AI-powered deployments on the Salesforce platform.