Containers
Accelerate container troubleshooting with the fully managed Amazon ECS MCP server (preview)
Amazon Elastic Container Service (Amazon ECS) today launched a fully managed, remote Model Context Protocol (MCP) server in preview. The Amazon ECS MCP server provides AI agents with deep contextual knowledge of ECS workflows, APIs, and best practices, enabling more accurate and actionable guidance throughout your application lifecycle.
Containerized applications package software and dependencies into isolated units, delivering numerous benefits from portability to scalability. Amazon ECS, a fully managed container orchestration service, offers the perfect balance of simplicity and power—enabling teams to build, manage, and run even the most demanding containerized workloads without the complexity of infrastructure management. Now, with the ECS MCP server, you can use AI agents, such as Kiro CLI, Cline, and Cursor, to inspect and troubleshoot containerized workloads seamlessly from your local machines using natural language.
AI agents are transforming how developers and operators work, offering intelligent coding assistance that understands context on the local machine. With the Amazon ECS MCP server, we’re extending these capabilities to container infrastructure, enabling AI agents to become true partners in your operational workflows. Imagine having an AI assistant that not only understands container orchestration concepts but can actually see your ECS environment in real-time–it can analyze your running services, investigate deployment configurations, examine task health, and provide recommendations based on both AWS best practices and your specific setup. While our previously released open-source MCP server provided a foundation for local AI assistance, the cloud-hosted remote MCP server elevates this experience of automatic updates with latest ECS features and best practices, centralized security through AWS Identity and Access Management (IAM) integration, enhanced auditability features via AWS CloudTrail, and the scalability and reliability of AWS infrastructure. You no longer need to manage MCP server updates, worry about compatibility with new ECS features, or maintain local infrastructure—it just works. Further, compared to locally hosted MCPs, Amazon ECS remote MCP is uniquely suited for autonomous, agentic workloads, such as incident‑response agents and multi‑agent orchestrators.
The Amazon ECS MCP server not only enables intelligent AI agent interaction in the CLI and IDE, it also powers intelligent troubleshooting experience directly within the Amazon ECS console through Amazon Q. This seamless integration brings contextual AI-assisted inspection and diagnostics directly into your workflow in the AWS Management console.
In this blog post, we will walk through how to streamline your container troubleshooting using the Amazon ECS MCP server.
Solution overview
The Model Context Protocol (MCP) enables AI models to connect with enhanced security features to AWS services and tools. When customers submit requests through MCP clients like Kiro CLI, the AI model uses MCP servers to analyze requests and invoke the necessary AWS resources. Today, MCP servers for most AWS services are offered as open-source, downloadable packages in the AWS Labs repository, giving customers flexibility to run them locally with their own AWS credentials.
As adoption grew, especially among enterprise customers, we also heard interest in a managed, remotely hosted option that reduces undifferentiated heavy lifting and aligns with centralized operational models.
Local MCP servers continue to offer a simple and customizable way for builders to get started. At the same time, many enterprise environments prefer solutions that streamline updates, simplify credential management, and integrate natively with AWS security and governance controls. The new hosted MCP server is designed to meet those customers’ needs by providing a fully managed, scalable, and consistently updated experience.
Remote MCP servers address these enterprise needs by delivering several critical advantages:
- Operational excellence: Hosted solutions provide automatic updates and patching, eliminating the burden of manual maintenance across distributed development teams. Centralized management reduces operational complexity and ensures consistent behavior across all users.
- Enhanced security: Remote servers enable centralized security through AWS IAM integration, providing fine-grained access control and eliminating the risks associated with storing credentials on individual developer machines. This approach supports enterprise security policies and compliance requirements.
- Comprehensive auditability: Integration with AWS CloudTrail provides complete audit trails for all MCP operations, enabling organizations to track usage, monitor access patterns, and meet regulatory compliance requirements that are difficult to achieve with distributed local installations.
- Seamless integration: A hosted solution delivers seamless integration with the broader AWS ecosystem—from Q chat experiences in the AWS console to automated investigations via Amazon CloudWatch—while providing the scalability and reliability that enterprise workloads demand.
This is why Amazon ECS is launching its remote MCP server offering—to provide enterprise customers with the scalable, secure, and manageable AI integration platform they need to fully engage AWS services through their AI applications. To address the authentication gap between the MCP protocol and AWS services, you will use the AWS MCP Proxy (mcp-proxy-for-aws), which acts as an universal Signature Version 4 (SigV4) MCP proxy. This enables any MCP client to connect to AWS MCP services without requiring multiple proxy installations, simplifying the customer experience while maintaining security best practices.
The following image shows a high-level request routing to the remote MCP service via the proxy.
Figure 1: MCP Client communication to MCP server via aws-mcp-proxy
- Your application (Kiro CLI) runs aws-mcp-proxy as a local MCP server communicating via stdio
- aws-mcp-proxy uses your AWS profile’s credentials to connect to the specified remote MCP service endpoint url and proxies MCP requests to the remote service
The following image shows a sample request/response sequence.
Figure 2: A sample sequence diagram of the remote MCP request/response
The ECS MCP server currently provides the below seven read-only tools for your Amazon ECS cluster health check and troubleshooting support.
Amazon ECS Cluster Operations Tools
- get_deployment_status: The tool checks your Amazon ECS deployment status for a particular ECS cluster and service. You can use the tool for routine health checks, regular monitoring, and post-deployment use cases.
- fetch_network_configuration: This tool retrieves Amazon ECS service network configuration details. You can use this tool to understand the Amazon Virtual Private Cloud (Amazon VPC), subnet, security group configuration of your Amazon ECS setup.
Amazon ECS Cluster Troubleshooting Tools
- fetch_service_events: This tool retrieves Amazon ECS service events for diagnostics with customizable time windows. You will use the tool to investigate deployment problems or service instability.
- fetch_task_failures: This tool helps you retrieve and analyze Amazon ECS task failures with summaries. Using this tool, you can identify any patterns in task failures.
- fetch_task_logs: This tool retrieves Amazon CloudWatch logs for Amazon ECS tasks with flexible time range options. You use this tool to troubleshoot runtime issues and application behavior analysis.
- detect_image_pull_failures: This tool helps you to detect and categorize container image pull failures.
Amazon ECS Cluster Resource Management Tools
- get_task_definition_detection_blockers: This tool identifies dependencies that can prevent task definition deletion. The tool is appropriate when you are running cleanup operations and understand what is preventing Amazon ECS resource deletion.
For detailed parameters configuration and expected response, refer to Amazon ECS developer guide.
Prerequisites
The following prerequisites are required:
AWS Configuration: To use the feature, AWS Command Line Interface (AWS CLI) should be installed and configured in the machine. Follow this link to configure a default AWS profile with appropriate permissions. The AWS profile should have access to a commercial region where the ECS MCP server is available. We are using us-west-2 as an example in this blog post.
AWS IAM Permission: The local AWS profile needs Amazon ECS-related AWS IAM permissions for reading ECS clusters, services, and tasks, accessing Amazon CloudWatch logs, and viewing ECS service events and deployments. Refer to Amazon ECS developer guide for a sample AWS IAM policy. Follow principle of least privilege policy while granting access to AWS profiles.
AI Assistant & MCP client: You can use Kiro CLI as your MCP client. Follow this link to set up the MCP client configuration.
Python Environment: You need Python 3.10+ and the uv package manager installed. The package manager automatically downloads and runs the mcp-proxy-for-aws package, so you don’t need to install that separately. The MCP proxy allows clients to connect to remote, AWS-hosted MCP servers using AWS SigV4 authentication.
Solution walkthrough
To use the MCP server using Kiro, configure the mcp.json file using the entry below in ~/.kiro/settings/mcp.json.
Launch the Kiro Command Line Interface (CLI) and enter /tools command to view all available tools. The system processes natural language queries, allowing you to monitor cluster health and troubleshoot issues directly through CLI. This streamlined approach eliminates the need for custom scripts or AWS Management Console access, making system management more efficient. The following figure shows the available tools in the remote mcp server.
Figure 3: List of available tools in the remote Amazon ECS MCP server
Operations tools
The MCP server enables you to monitor cluster deployments and configurations using natural language queries through Kiro. You can track deployment progress by asking simple questions such as “Is the deployment complete?” or “What is the current deployment status?” Kiro processes these requests using the get_deployment_status() tool to provide real-time information about your cluster’s state. Additionally, the MCP server allows you to query detailed network configuration information and access service event logs for any cluster in your environment.
Get deployment status
The following image shows the request and response of get_deployment_status() tool usage.
Figure 4: get_deployment_status() tool usage
Troubleshooting tools
The MCP server offers enhanced troubleshooting capabilities through natural language queries. You can investigate cluster issues by asking questions like “List all task failures” or “Were there any failures while pulling images?” For more targeted analysis, the server supports time-based log queries, such as “display the logs for cluster sample-nodejs-app-cluster for the last 15 min.” Additionally, you can assess task management conditions to determine if specific tasks are eligible for deletion. These troubleshooting tools simplify cluster maintenance by providing quick access to critical diagnostic information.
List all task failures
The following image shows a sample task failures invocation.
Figure 5: fetch_task_failures() tool invocation
Check for image pulling errors
The following image shows a sample tool invocation to detect image failures in your Amazon ECS cluster.
Figure 6: detect_image_pull_failures() tool invocation
Check for task deletion status
The following image shows a sample tool invocation for fetching task definition delete blocker information.
Figure 7: get_task_definition_deletion_blockers() tool invocation
ECS Console Experience
Amazon ECS MCP server is integrated with Amazon Q in the Amazon ECS console. When you encounter issues with your Amazon ECS resources in the console—failed tasks, deployment rollbacks, task definition not deleting, or container health check failures—an “Inspect with Amazon Q” button appears contextually alongside error messages when you hover over or click the resource status. Clicking “Inspect with Amazon Q” triggers agentic orchestrations behind the scenes to extract relevant context, construct the appropriate prompt, pass the context to Amazon Q, invoke Amazon ECS MCP tools, root cause the issue, and suggest mitigation steps.
Consider a scenario where your Amazon ECS task failed, and you would like to investigate why. Clicking the task status reason opens a popover with an “Inspect with Amazon Q” button. As shown in the video below, clicking the button opens the Amazon Q chat panel, and you can watch as Q analyzes the issue step by step, using various MCP tools to perform comprehensive root cause checks. In this scenario, it identified the issue that caused the task to fail—the task could not connect to the Amazon Elastic Container Registry (Amazon ECR), and we needed to update our security group rule configuration ensure network reachability.
Figure 8: Amazon Q troubleshooting of Amazon ECS Cluster
You can learn more about AI-powered experiences in the ECS console in the ECS Developer Guide.
Conclusion
The Amazon ECS MCP Remote server offers you a transformative approach to enterprise container management, seamlessly connecting your AI-powered workflows with robust AWS infrastructure. By enabling you to interact with your containerized applications through natural language queries via Amazon Q, this solution helps reduce the complexity and learning curve you’ve traditionally faced in container troubleshooting and monitoring. The Amazon ECS MCP server enables enterprise-grade AI experience through centralized management by AWS, security via AWS IAM integration, and auditability through AWS CloudTrail. You can now leverage AI assistants for your container operations and issue response while maintaining the security policies, compliance requirements, and operational standards your enterprise environment needs.
As you continue to embrace AI-driven workflows, the Amazon ECS MCP Server provides you with a scalable, secure foundation for integrating intelligent automation into your container operations—making AWS services more accessible to you while helping achieve enhanced security features and governance your mission-critical applications require. To learn more on getting started, check out the Amazon ECS Developer Guide.
About the authors
Rajdeep Banerjee is a Senior Partner Solutions Architect at AWS helping strategic partners and clients in the AWS cloud migration and digital transformation journey. Rajdeep focuses on working with partners to provide technical guidance on AWS, collaborate with them to understand their technical requirements, and designing solutions to meet their specific needs. He is a member of Serverless technical field community. Rajdeep is based out of Richmond, Virginia.
Lavanya Tangutur serves as a Senior Technical Account Manager at Amazon Web Services (AWS) focused on helping customers build, deploy, and run secure, resilient, and cost-effective workloads on AWS. She combines her passion for coding with customer engagements to implement AWS best practices and solutions.
Stacey Hou is a Senior Product Manager – Technical at AWS, where she focuses on GenAI initiatives and observability for Amazon Elastic Container Service (ECS). She works closely with customers and engineering teams to drive innovations that simplify the experience of building, operating, and troubleshooting containerized applications.