AWS Cloud Operations Blog
Category: Management & Governance
Your Essential Guide to Cloud Governance at AWS re:Invent 2025
With organizations increasingly recognizing governance as a strategic enabler rather than a compliance burden, this year’s Cloud Governance under AWS Cloud Ops track delivers cutting-edge sessions that bridge the gap between operational excellence and business innovation. The governance landscape is evolving rapidly, and this year’s sessions are organized around four critical themes that reflect the […]
Optimizing metrics ingestion with Amazon Managed Service for Prometheus
Managing metrics collection at scale in complex cloud environments presents significant challenges for organizations, particularly when it comes to controlling costs and maintaining operational efficiency. As the volume of metrics grows exponentially with the expansion of container deployments and other cloud-native workloads, customers often struggle to balance comprehensive monitoring with resource optimization. This can lead […]
AWS Organizations launches account state information for granular account lifecycle management
AWS Organizations enables customers to centrally manage their AWS accounts. Since many customers prefer to automate the account creation process, they can leverage CreateAccount API, thereby creating an account vending pipeline. This pipeline standardizes the deployment of policies, roles, and resources across new accounts while managing the complete lifecycle through eventual account closure. Through this […]
Advanced analytics using Amazon CloudWatch Logs Insights
Effective log management and analysis are critical for maintaining robust, secure, and high-performing systems. Amazon CloudWatch Logs Insights has long been a powerful tool for searching, filtering, and analyzing log data across multiple log groups. The addition of OpenSearch Piped Processing Language (PPL) and OpenSearch SQL language query support offers greater flexibility and familiarity in […]
Enhance your AIOps: Introducing Amazon CloudWatch and Application Signals MCP servers
Modern architectures generate vast amounts of observability data across metrics, logs, and traces. When issues arise, teams spend hours—sometimes days—manually correlating information across multiple dashboards to identify root causes, directly impacting MTTR and productivity. Amazon CloudWatch Application Signals addresses this challenge by providing deep application visibility through automatic instrumentation, capturing key metrics like latency, error […]
Best practices for analyzing AWS Config recording frequencies
AWS Config tracks configuration changes across your AWS resources and AWS Organizations. AWS Config uses the configuration recorder to detect changes and records them as configuration items (CIs). As your infrastructure grows and becomes more complex, choosing the appropriate recording frequency becomes critical for maintaining operational visibility, meeting compliance requirements, and supporting your security posture. Since the launch of the periodic recording […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 2
In practice: SLO monitoring with CloudWatch Application Signals In the previous post, we’ve shared the basic concepts and benefits of burn rate monitoring. In this post, we, the Amazon Product Search team, will share anecdotes from our migration from an in-house solution to CloudWatch Application Signals, and introduce how we actually implement monitoring and dashboards. […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 1
In theory: SLO concepts applied to Amazon Product Search In this series of posts, we will show you how we, the Amazon Product Search team, monitor key systems using Service Level Objectives (SLOs) and share our migration journey from an in-house solution to Amazon CloudWatch Application Signals. Amazon Product Search is a large distributed system […]
Using Amazon Bedrock and Amazon Nova for AI-Powered Incident Response
In today’s cloud-native world, incident response teams face overwhelming challenges. When critical applications fail, engineers must sift through mountains of observability data across multiple services; all while under intense pressure to restore service quickly. This manual correlation process is time-consuming, error-prone, and often delays resolution, resulting in extended outages and frustrated customers. Traditional monitoring tools […]
Launching Amazon CloudWatch generative AI observability (Preview)
As organizations rapidly deploy large language models (LLMs) and generative AI agents to power increasingly intelligent workloads, they struggle to monitor and troubleshoot the complex interactions within their AI applications. Traditional monitoring tools fall short in providing the visibility across components, leading to developers and AI/ML engineers to manually correlate interaction logs or building custom […]








