Building agentic AI for Amazon RDS for SQL Server with Strands and AgentCore

If you manage Amazon Relational Database Service (Amazon RDS) for SQL Server instances, you’ve likely accumulated a collection of diagnostic scripts over the years. These scripts query for blocking sessions, identify slow-running procedures, monitor disk space, and analyze index usage. They represent expertise you’ve developed through years of troubleshooting database issues, often during critical late-night incidents. Your scripts can now form the foundation for AI agents that work continuously. Amazon Bedrock AgentCore Runtime with Strands Agents enables you to transform your existing T-SQL knowledge into an autonomous database management system. When you combine AI models with purpose-built AI agents, you create agents that understand context, make informed decisions, and execute complex operations autonomously.

In this post, we walk through building an agent that investigates blocking and deadlocks on Amazon RDS for SQL Server — two issues that directly impact application performance, cause transaction failures, and lead to user-facing timeouts. Using the Strands Agents framework, we convert the T-SQL queries DBAs already use for these investigations into agent tools, combine them into a single agent, and deploy it to AgentCore Runtime.

We start by explaining what agents and tools are, then walk through common DBA scenarios and how tools address them. From there, we build the tools, define the agent with a system prompt, and deploy it to AgentCore Runtime.

What are agents?

Agents are software systems that use AI to reason, plan, and complete tasks on behalf of humans or systems. In the context of database operations, an agent receives a question or an alarm payload, decides which tools to call, executes them, interprets the results, and responds with a diagnosis. For more on the Strands Agents framework, see Introducing Strands Agents: An Open Source AI Agents SDK.

At the core of every Strands agent is the agent loop: invoke the AI service, check if it wants to use a tool, execute the tool, then invoke the service again with the result. This cycle repeats until the model produces a final response.

Input → [Reasoning (LLM) → Tool Selection → Tool Execution] → Response
              ↑_________________________________↓

Each iteration adds to the conversation history. The agent tracks not just the original request, but every tool it has called and every result it has received. This accumulated context enables multi-step reasoning. For example, the agent can check for deadlocks, then analyze blocking chains, then correlate both findings into a single diagnosis.

What are tools?

Now that we understand how agents reason, let’s look at the building blocks they use. In agentic AI systems, tools are functions that an agent can invoke to interact with external systems, retrieve data, or perform actions. Tools enable the agent’s capabilities beyond text generation: querying databases, calling APIs, monitoring systems, and sending notifications. They are the bridge between an agent’s reasoning capabilities and the real world.

In Strands, a tool is a Python function decorated with @tool. The function’s docstring tells the agent when and why to call it. The agent reads the docstring, matches it against the user’s question, and decides whether to invoke the function. This means that the existing diagnostic script that a DBA already uses can become a tool by wrapping it in a function with a @tool decorator and a descriptive docstring.

from strands import Agent, tool

@tool
def get_server_version() -> str:
    """Return the SQL Server version. Use this to verify connectivity
    and identify the engine version."""
    conn = get_db_connection()
    cursor = conn.cursor()
    cursor.execute("SELECT @@VERSION")
    version = cursor.fetchone()[0]
    conn.close()
    return version

The @tool decorator registers the function with the agent. The type hints define the input and output schema. The docstring is what the agent uses to decide when to call it. Write it as you would a runbook entry.

Common DBA scenarios

With agents and tools defined, let’s look at the specific database problems we want to solve.

Deadlocks. When a deadlock occurs, you need to identify the sessions involved, the SQL statements each was executing, the lock types and objects in contention, and the victim session. On Amazon RDS for SQL Server, deadlock information is captured in two ways:

Trace flags 1204 and 1222 — When enabled through a custom DB parameter group, these write deadlock details to the SQL Server error log, which can be published to Amazon CloudWatch Logs. See Monitor deadlocks in Amazon RDS for SQL Server for setup steps.
The system_health extended event session — This built-in session captures deadlock graphs via the xml_deadlock_report event by default. Both sides of the deadlock, including SQL statements, lock types, and objects, are recorded in the XE file targets at D:\rdsdbdata\log.

Blocking. When applications report timeouts, you need to find the head blocker — the session at the root of the blocking chain — along with its SQL statement, the number of sessions waiting behind it, and how long they’ve been waiting. Capturing this information on RDS for SQL Server depends on what level of detail is needed:

Current blocking — The DMVs sys.dm_exec_requests and sys.dm_exec_sql_text provide the current blocking state, including blocker and blocked session IDs, wait types, and SQL text. This data remains available without additional configuration.
Historical blocking (blocked session only) — The system_health session captures lock waits exceeding 30 seconds via the wait_info event. This records the blocked session’s ID and SQL text, but not the blocker’s identity. CloudWatch Logs does not contain system_health data — it only receives the SQL Server error log.
Historical blocking (both sides) — To capture both the blocker and blocked session after the fact, set blocked process threshold in your DB parameter group and create a custom extended event session for blocked_process_report. The system_health session does not capture this event. See Using extended events with Amazon RDS for Microsoft SQL Server for configuration details.

How tools address these scenarios

Now let’s map each scenario to a tool that the agent can call.

Tool	What it does	Data source
get_deadlock_graphs	Reads deadlock graphs with full details for both sides	system_health Extended Event (XE) files (xml_deadlock_report)
get_blocking_chains	Walks the current blocking hierarchy, identifies head blockers and their SQL	DMVs (sys.dm_exec_requests, sys.dm_exec_sql_text)
get_session_details	Retrieves login, host, program, and SQL for a specific session	DMV (sys.dm_exec_sessions)
send_diagnostic_report	Sends the agent’s findings and recommendations to the DBA team	Amazon SNS
get_blocked_process_reports	Reads historical blocking details including both blocker and blocked sessions	Custom XE session (blocked_process_report)

Tool code

Let’s see how these tools translate into Python code. Each tool wraps an existing SQL Server diagnostic query using the Strands @tool decorator.

The tools share a connection helper that retrieves credentials from AWS Secrets Manager:

from strands import Agent, tool
import boto3, pymssql, json, os
from datetime import datetime, timedelta

def get_db_connection():
    client = boto3.client('secretsmanager', region_name=os.getenv('AWS_REGION', 'us-west-2'))
    secret = client.get_secret_value(SecretId=os.getenv('DB_SECRET_ID'))
    creds = json.loads(secret['SecretString'])
    return pymssql.connect(
        server=creds['host'], user=creds['username'],
        password=creds['password'], port=creds['port']
    )

Deadlock detection. Reads deadlock graphs from the system_health XE files:

@tool
def get_deadlock_graphs(hours: int = 24) -> dict:
    """Read deadlock graphs from the system_health extended event session.
    Returns XML deadlock graphs with process lists, SQL statements, and lock
    details for both sides of each deadlock. Use this when applications report
    error 1205, or for periodic deadlock analysis."""
    conn = get_db_connection()
    cursor = conn.cursor(as_dict=True)
    cursor.execute("""
        SELECT
            CAST(event_data AS XML).value('(event/@timestamp)[1]', 'DATETIME2') AS event_time,
            CAST(event_data AS XML).value('(event/data[@name="xml_report"]/value)[1]',
                'NVARCHAR(MAX)') AS deadlock_graph
        FROM sys.fn_xe_file_target_read_file(
            'd:\\rdsdbdata\\log\\system_health*.xel', NULL, NULL, NULL)
        WHERE CAST(event_data AS XML).value('(event/@name)[1]', 'VARCHAR(100)')
              = 'xml_deadlock_report'
          AND CAST(event_data AS XML).value('(event/@timestamp)[1]', 'DATETIME2')
              > DATEADD(HOUR, -%s, GETUTCDATE())
        ORDER BY event_time DESC
    """, (hours,))
    results = cursor.fetchall()
    conn.close()
    return {"deadlock_count": len(results), "deadlock_graphs": results}

Blocking chain analysis. Walks the blocking hierarchy using a recursive CTE:

@tool
def get_blocking_chains() -> dict:
    """Walk the current blocking chain hierarchy using sys.dm_exec_requests
    and sys.dm_exec_sql_text. Returns head blockers, their SQL, wait types,
    durations, and all downstream blocked sessions. Use this when applications
    report timeouts or hangs."""
    conn = get_db_connection()
    cursor = conn.cursor(as_dict=True)
    cursor.execute("""
        WITH BlockingChain AS (
            SELECT r.session_id, r.blocking_session_id, r.wait_type,
                   r.wait_time / 1000.0 AS wait_seconds, r.status, r.command,
                   t.text AS sql_text, DB_NAME(r.database_id) AS database_name,
                   0 AS chain_level
            FROM sys.dm_exec_requests r
            CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
            WHERE r.blocking_session_id = 0
              AND r.session_id IN (
                  SELECT DISTINCT blocking_session_id FROM sys.dm_exec_requests
                  WHERE blocking_session_id != 0)
            UNION ALL
            SELECT r.session_id, r.blocking_session_id, r.wait_type,
                   r.wait_time / 1000.0, r.status, r.command, t.text,
                   DB_NAME(r.database_id), bc.chain_level + 1
            FROM sys.dm_exec_requests r
            CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) t
            JOIN BlockingChain bc ON r.blocking_session_id = bc.session_id
        )
        SELECT session_id, blocking_session_id, wait_type, wait_seconds,
               status, command, sql_text, database_name, chain_level
        FROM BlockingChain ORDER BY chain_level, wait_seconds DESC
    """)
    results = cursor.fetchall()
    conn.close()
    head_blockers = [r for r in results if r['chain_level'] == 0]
    return {"total_blocked_sessions": len([r for r in results if r['chain_level'] > 0]),
            "head_blockers": len(head_blockers), "blocking_chains": results}

Session details. Retrieves context for a specific session:

@tool
def get_session_details(session_id: int) -> dict:
    """Get login name, host, program, and current SQL for a specific session.
    Use this after identifying a head blocker."""
    conn = get_db_connection()
    cursor = conn.cursor(as_dict=True)
    cursor.execute("""
        SELECT s.session_id, s.login_name, s.host_name, s.program_name,
               s.status, s.transaction_isolation_level,
               s.last_request_start_time, s.last_request_end_time,
               t.text AS current_sql
        FROM sys.dm_exec_sessions s
        OUTER APPLY sys.dm_exec_sql_text(s.most_recent_sql_handle) t
        WHERE s.session_id = %s
    """, (session_id,))
    result = cursor.fetchone()
    conn.close()
    return {"session_id": session_id, "details": result}

Diagnostic reporting. Sends findings via SNS:

@tool
def send_diagnostic_report(subject: str, report: str) -> dict:
    """Send a diagnostic report via SNS email to the DBA team. Use this after
    completing an investigation to deliver findings and recommendations."""
    sns = boto3.client('sns', region_name=os.getenv('AWS_REGION', 'us-west-2'))
    topics = sns.list_topics()['Topics']
    topic_arn = next((t['TopicArn'] for t in topics
                      if os.getenv('SNS_TOPIC_NAME') in t['TopicArn']), None)
    if topic_arn:
        sns.publish(TopicArn=topic_arn, Subject=subject[:100], Message=report)
        return {"status": "sent", "topic": topic_arn}
    return {"status": "no_topic_found"}

Historical blocked process reports. Reads blocker and blocked session details from a custom XE session. Requires blocked process threshold set in the DB parameter group and a custom extended event session capturing blocked_process_report with a file target. See Using extended events with Amazon RDS for Microsoft SQL Server for session creation.

@tool
def get_blocked_process_reports(hours: int = 24) -> dict:
    """Read blocked process reports from extended event file targets.
    Returns both blocker and blocked session details including SQL text,
    wait time, and lock resources. Requires a custom XE session capturing
    blocked_process_report. Use this for historical blocking analysis
    when you need to identify who was holding the lock."""
    conn = get_db_connection()
    cursor = conn.cursor(as_dict=True)
    cursor.execute("""
        SELECT
            CAST(event_data AS XML).value('(event/@timestamp)[1]', 'DATETIME2') AS event_time,
            CAST(event_data AS XML).value('(event/data[@name="duration"]/value)[1]', 'BIGINT') AS duration_ms,
            CAST(event_data AS XML).value('(event/data[@name="blocked_process"]/value)[1]',
                'NVARCHAR(MAX)') AS blocked_process_report
        FROM sys.fn_xe_file_target_read_file(
            'd:\\rdsdbdata\\log\\blocked*.xel', NULL, NULL, NULL)
        WHERE CAST(event_data AS XML).value('(event/@timestamp)[1]', 'DATETIME2')
              > DATEADD(HOUR, -%s, GETUTCDATE())
        ORDER BY event_time DESC
    """, (hours,))
    results = cursor.fetchall()
    conn.close()
    return {"blocked_process_count": len(results), "reports": results}

Note: All code examples use parameterized queries (%s placeholders) to help prevent SQL injection.

Defining the agent: System prompt and tools

With the tools built, we now define the agent itself. The system prompt defines how the agent reasons about problems, and the tools list tells it what capabilities are available.

# ---------------------------------------------------------------------------
# AgentCore app and model configuration
# ---------------------------------------------------------------------------
from strands import Agent, tool
from strands.models import BedrockModel
from bedrock_agentcore.runtime import BedrockAgentCoreApp
import boto3, pymssql, json, os
from datetime import datetime, timedelta

app = BedrockAgentCoreApp()

model = BedrockModel(
    model_id=os.getenv('BEDROCK_MODEL_ID', '<your-model-id>'),
    region_name=os.getenv('AWS_REGION', '<your-region>'),
    temperature=0.3
)

# ---------------------------------------------------------------------------
# Database connection helper — retrieves credentials from Secrets Manager
# ---------------------------------------------------------------------------

def get_db_connection():
    client = boto3.client('secretsmanager', region_name=os.getenv('AWS_REGION', '<your-region>'))
    secret = client.get_secret_value(SecretId=os.getenv('DB_SECRET_ID'))
    creds = json.loads(secret['SecretString'])
    return pymssql.connect(
        server=creds['host'], user=creds['username'],
        password=creds['password'], port=creds['port']
    )

# --- Paste tool code here ---
# @tool get_deadlock_graphs       (from Deadlock detection section)
# @tool get_blocking_chains       (from Blocking chain analysis section)
# @tool get_session_details       (from Session details section)
# @tool get_blocked_process_reports (from Historical blocked process section)
# @tool send_diagnostic_report    (from Diagnostic reporting section)

# ---------------------------------------------------------------------------
# Agent definition — system prompt guides reasoning, tools define capabilities
# ---------------------------------------------------------------------------
agent = Agent(
    system_prompt="""You are a SQL Server DBOps Agent for Amazon RDS.
When investigating issues:
1. Start with the symptoms — don't run every tool on every question
2. Use the deadlock tool when applications report error 1205 or transaction failures
3. Use blocking chain tools when applications report timeouts or lock waits
4. Use blocked process reports for historical blocking analysis when current blocking has resolved
5. Correlate findings across data sources before giving your assessment
6. Provide severity (Critical/Warning/Info) and specific, actionable recommendations
7. When sending diagnostic reports via SNS, include: affected session IDs, SQL statements, wait types, duration, host/IP, login name, root cause, and specific remediation steps""",
    model=model,
    tools=[get_deadlock_graphs, get_blocking_chains, get_session_details,
           get_blocked_process_reports, send_diagnostic_report]
)
# ---------------------------------------------------------------------------
# AgentCore entrypoint — receives payload from agentcore invoke
# ---------------------------------------------------------------------------

@app.entrypoint
def handler(payload):
    response = agent(payload.get("prompt", ""))
    return response.message['content'][0]['text']

if __name__ == "__main__":
    app.run()

Solution workflow

At the heart of this transformation lies the integration of three key technologies:

Strands agents & tools — Wrap existing T-SQL expertise in intelligent agents that reason about database conditions and execute responses automatically.
AgentCore Runtime — Deploy agents securely into your VPC with built-in scaling and observability.
AgentCore Memory — Retain findings across sessions so agents learn from past investigations.

The following diagram illustrates the end-to-end workflow from operator prompt to diagnostic report.

Flow:

DBA sends prompt — “Check for deadlocks in the last 24 hours and send a diagnostic report via SNS”.
Agent sends to LLM for reasoning — The model reads the prompt and the system prompt to understand the task.
LLM selects tools — Based on the docstrings, it decides to call get_deadlock_graphs first.
Agent executes tool — Queries system_health XE files on Amazon RDS for SQL Server for xml_deadlock_report events
Tool returns results — Deadlock graphs with session IDs, SQL statements, lock types, and objects.
Agent sends results to LLM for analysis — The model interprets the raw deadlock XML and identifies root cause.
LLM generates response — Produces a diagnosis with severity, root cause, affected queries, and remediation steps. Decides to call send_diagnostic_report to deliver findings via SNS.
Agent returns response to DBA — The DBA receives both the inline response and an email with the full diagnostic report.

Deploying to AgentCore Runtime

The agent is defined and the tools are ready. The complete source code, including both agent implementations, IAM policies, and deployment instructions, is available on GitHub. Refer to An AI Agent for Deadlock Analysis on Amazon Amazon RDS for SQL Server.

Let’s walk through getting it running on AgentCore Runtime.

Prerequisites

The following resources are required for this walkthrough. You can use existing resources or create new ones. Each resource maps to an environment variable used during deployment.

Amazon RDS for SQL Server

DB_INSTANCE_ID — An RDS for SQL Server instance (Standard or Enterprise Edition)
Trace flags 1204 and 1222 enabled via a custom DB parameter group. See Monitor deadlocks in Amazon RDS for SQL Server.
A custom extended event session for blocked_process_report for historical blocking capture. See Using extended events with Amazon RDS for SQL Server.
DB_SECRET_ID — Database credentials stored in AWS Secrets Manager with a dedicated, least-privilege database login.
SNS_TOPIC_NAME — An Amazon Simple Notification Service (Amazon SNS) topic with an active subscription

Amazon Bedrock AgentCore

AWS_REGION — Access to Amazon Bedrock foundation models enabled in your region
AGENTCORE_ROLE_ARN — An IAM execution role with permissions for Bedrock, Secrets Manager, SNS, and CloudWatch Logs. See AgentCore Runtime permissions.
SUBNET1, SUBNET2, SECURITY_GROUP_ID — A VPC with private subnets and a security group allowing outbound traffic to RDS on port 1433. See AgentCore VPC configuration.

Development environment

Python 3.10 or newer
AWS Command Line Interface (AWS CLI) configured with appropriate permissions

Step 1: Clone and install

git clone https://github.com/aws-samples/sample-agentcore-sqlserver-dbops-agent.git

cd sample-agentcore-sqlserver-dbops-agent

uv venv && source .venv/bin/activate
uv pip install -r requirements.txt
uv pip install bedrock-agentcore-starter-toolkit

Step 2: Set environment variables

Set the following environment variables for your deployment. These reference the resources created in the prerequisites:

export AWS_REGION=<your-region>
export DB_INSTANCE_ID=<your-rds-instance-id>       # RDS SQL Server instance
export DB_SECRET_ID=<your-secrets-id>            # Secrets Manager credential
export SNS_TOPIC_NAME=<your-sns-topic-name>         # SNS topic for reports
export AGENTCORE_ROLE_ARN=<your-execution-role-arn>    # IAM execution role
export SECURITY_GROUP_ID=<your-security-group-id>      # VPC security group
export SUBNET1=<your-first-subnet-id>                  # VPC private subnet
export SUBNET2=<your-second-subnet-id>                 # VPC private subnet

Step 3: Configure and deploy

Use the AgentCore CLI to configure and deploy the agent into your VPC:

agentcore configure \
--name my_first_agentcore_runtime \
--entrypoint agent.py \
--execution-role $AGENTCORE_ROLE_ARN \
--deployment-type direct_code_deploy \
--vpc \
--subnets $SUBNET1,$SUBNET2 \
--security-groups $SECURITY_GROUP_ID

agentcore deploy \
--env AWS_REGION=$AWS_REGION \
--env DB_SECRET_ID=$DB_SECRET_ID \
--env SNS_TOPIC_NAME=$SNS_TOPIC_NAME \
--env AGENT_OBSERVABILITY_ENABLED=true

agentcore status

Step 4: Test your deployed agent

First, simulate a deadlock by creating two tables (##Employees and ##Suppliers) and running two concurrent transactions that update them in opposite order. Session 1 updates ##Employees then ##Suppliers, while Session 2 updates ##Suppliers then ##Employees. This creates a lock ordering conflict that SQL Server resolves by terminating one transaction. Follow the steps in Testing with a simulated deadlock

Then invoke the agent:

agentcore invoke '{"prompt": "Check for deadlocks in the last 24 hours, provide RCA and recommendations, and send a diagnostic report via SNS"}'

The agent reads the deadlock graph from system_health, identifies both sessions and their SQL statements, correlates with active blocking chains, and sends a diagnostic report via SNS with root cause analysis and remediation steps.

demo-server $ 

demo-server $ agentcore invoke '{"prompt": "Check for deadlocks in the last 24 hours, provide RCA and recommendations, and send a diagnostic report via SNS"}'

╭───────────────────────────────────────────────────────────────────── my_first_agentcore_runtime ─────────────────────────────────────────────────────────────────────╮

│ Session: bb18d342-0987-4854-bc8f-bb5c0edff095                                                                                                                                                                                                                                                                                                      │

│ Request ID: 554ff565-e99e-430c-b79c-a6e480d14a06                                                                                                                                                                                                                                                                                                │

│ ARN: arn:aws:bedrock-agentcore:us-west-2:111122223333:runtime/my_first_agentcore_runtime-MnC7NP39B6                                                                                                                                                                      │

│ Logs: aws logs tail /aws/bedrock-agentcore/runtimes/my_first_agentcore_runtime-MnC7NP39B6-DEFAULT --log-stream-name-prefix "2026/03/25/[runtime-logs" --follow                                              │

│       aws logs tail /aws/bedrock-agentcore/runtimes/my_first_agentcore_runtime-MnC7NP39B6-DEFAULT --log-stream-name-prefix "2026/03/25/[runtime-logs" --since 1h                                               │

│ GenAI Dashboard: https://console.aws.amazon.com/cloudwatch/home?region=us-west-2#gen-ai-observability/agent-core                                                                                                                                                                                                                                                                                                                                                                                                         │

╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Response:

## Comprehensive Analysis Complete

I've conducted a thorough deadlock investigation and delivered a detailed diagnostic report. Here's my assessment:

### **Key Findings:**

- **1 deadlock** detected at 07:16:01 on March 25, 2026

- **Severity: WARNING** - Low frequency but requires proactive intervention

- **Root Cause:** Lock ordering inconsistency between Suppliers and Employees table updates

### **Root Cause Analysis:**

The deadlock occurred due to concurrent transactions updating related records (both with ID = 1) in different order, creating a circular wait condition. This suggests 

either:

- Foreign key relationships between tables

- Application logic that doesn't follow consistent lock ordering

- Potential business process coupling between supplier and employee data

### **Critical Recommendations Delivered:**

1. **Immediate:** Implement deadlock retry logic and consistent lock ordering

2. **Short-term:** Enable READ_COMMITTED_SNAPSHOT and enhance monitoring  

3. **Long-term:** Architectural review and performance optimization

### **Actions Completed:**

✅ Comprehensive deadlock analysis  

✅ Detailed root cause analysis  

✅ Prioritized recommendations with timelines  

✅ Prevention strategy outlined  

✅ Monitoring plan established  

✅ Diagnostic report sent via SNS to DBA team  

The report includes specific technical details, business impact assessment, and a structured action plan with priorities and timelines to prevent future occurrences.

demo-server $

Best practices

Cross-session memory. Without memory, every invocation starts fresh. With AgentCore Memory, findings persist across sessions through two complementary systems: short-term memory (STM) stores raw conversation events for session continuity, while long-term memory (LTM) extracts durable insights that carry across invocations.

LTM supports multiple extraction strategies. The semantic strategy captures facts that remain relevant over time: “Recurring deadlocks between Employees and Suppliers updates because of opposite lock ordering.” The summary strategy condenses investigation sessions: “Investigation found 6 deadlocks involving ##Employees and ##Suppliers tables and a 14-session blocking chain caused by an index rebuild.” Transient metrics like current CPU utilization are correctly skipped; only durable patterns are retained.

Create a memory resource with both strategies:

agentcore memory create dbops_shared_memory \
  --strategies '[{"semanticMemoryStrategy": {"name": "dbops_facts"}}, {"summaryMemoryStrategy": {"name": "dbops_summaries"}}]' \
  --event-expiry-days 30 \
  --region $AWS_REGION \
  --wait
AGENTCORE_MEMORY_ID=$(agentcore memory list --region $AWS_REGION 2>/dev/null | grep dbops_shared_memory | awk '{print $4}')

Integrating memory requires a code change to agent.py. Add the AgentCoreMemorySessionManager as the session manager. See agent_with_memory.py for the complete example.

On Monday, the agent finds 6 deadlocks involving updates to ##Employees and ##Suppliers tables in opposite lock order, and sends a diagnostic report via SNS.
On Tuesday, the DBA deploys a lock ordering fix based on the agent’s recommendation.
On Wednesday, the alarm fires again. The agent recalls Monday’s findings from memory, runs a fresh investigation, and compares: “The Employees/Suppliers deadlock pattern has stopped. New deadlock detected between different tables.”

Observability

To enable observability:

Add aws-opentelemetry-distro to your requirements.txt file.
Set the environment variable AGENT_OBSERVABILITY_ENABLED=true during deployment.
AgentCore automatically instruments traces, token usage, and request duration.
No code changes required.
View metrics in the CloudWatch GenAI Observability dashboard.

Scaling with additional tools.

The same @tool pattern applies to other diagnostic scripts:

Slow query analysis via Query Store
Index recommendations from sys.dm_db_missing_index_details
TempDB troubleshooting

Strands also supports Model Context Protocol (MCP), allowing agents to connect to external tool servers and extend their capabilities without modifying agent code.

Key takeaways

The system_health extended event session captures deadlock graphs by default. The agent reads them and delivers root cause analysis.
Blocking chain analysis uses DMVs to identify head blockers and their SQL. Historical blocker identity requires blocked process threshold and a custom extended event session.
The @tool decorator connects your existing T-SQL queries to the LLM. The docstring determines when the agent calls each tool.
Correlation across data sources — connecting deadlocks to transaction errors and blocking to timeouts — is what distinguishes an agent from individual scripts.
Strands Agents provides the tool framework, AgentCore Runtime provides managed deployment, and together they turn your existing T-SQL expertise into production-ready agents.
Design tools that are focused (do one thing well), efficient (return only necessary data), reliable (handle errors gracefully), fast (execute quickly), and clear (well-documented). Good tool design is what makes an agent effective.

Clean up

Complete the following steps to clean up your resources:

On the Amazon RDS console, choose Databases in the navigation pane.
Select the DB instance to delete.
On the Actions menu, choose Delete.
Enter delete me to confirm deletion, then choose Delete.
When prompted to create a final snapshot and retain automated backup, choose the option appropriate for your needs.

To delete the AgentCore Runtime resource:

agentcore destroy --agent my_first_agentcore_runtime

To delete the AgentCore Memory resource:

agentcore memory delete $AGENTCORE_MEMORY_ID --region $AWS_REGION

Conclusion

In this post, we showed how to turn existing T-SQL diagnostic scripts into an AI-powered database operations agent using Strands Agents and Amazon Bedrock AgentCore. The agent investigates deadlocks and blocking on Amazon RDS for SQL Server, correlates findings across data sources, and delivers actionable recommendations, all without manual intervention.

To get started, clone the sample repository and deploy the agent against your own RDS for SQL Server instance. Try extending it with additional tools for your most common diagnostic workflows, such as Query Store analysis, index recommendations, or TempDB troubleshooting. We’d love to hear what you build. Share your feedback and questions in the comments section.

AWS Database Blog