AWS Cloud Operations Blog
Tag: Monitoring & Observability
Simplifying Log Management using Amazon CloudWatch Logs Centralization
Managing logs across multiple AWS accounts and regions has always been a complex challenge for organizations. As AWS infrastructure grows to include separate accounts for production, development, and staging environments, along with regions, the complexity of log management increases exponentially. During critical incidents, especially during off-hours, teams spend valuable time, searching through multiple accounts, correlating […]
Optimizing metrics ingestion with Amazon Managed Service for Prometheus
Managing metrics collection at scale in complex cloud environments presents significant challenges for organizations, particularly when it comes to controlling costs and maintaining operational efficiency. As the volume of metrics grows exponentially with the expansion of container deployments and other cloud-native workloads, customers often struggle to balance comprehensive monitoring with resource optimization. This can lead […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 2
In practice: SLO monitoring with CloudWatch Application Signals In the previous post, we’ve shared the basic concepts and benefits of burn rate monitoring. In this post, we, the Amazon Product Search team, will share anecdotes from our migration from an in-house solution to CloudWatch Application Signals, and introduce how we actually implement monitoring and dashboards. […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 1
In theory: SLO concepts applied to Amazon Product Search In this series of posts, we will show you how we, the Amazon Product Search team, monitor key systems using Service Level Objectives (SLOs) and share our migration journey from an in-house solution to Amazon CloudWatch Application Signals. Amazon Product Search is a large distributed system […]
Launching Amazon CloudWatch generative AI observability (Preview)
As organizations rapidly deploy large language models (LLMs) and generative AI agents to power increasingly intelligent workloads, they struggle to monitor and troubleshoot the complex interactions within their AI applications. Traditional monitoring tools fall short in providing the visibility across components, leading to developers and AI/ML engineers to manually correlate interaction logs or building custom […]
SAP on AWS – Streamlined Operations and Monitoring
SAP ERP (Enterprise Resource Planning) systems are at the core of many enterprises, supporting a wide range of mission-critical processes, including Procure to Pay, Order to Cash, Production Planning, Financial Accounting, Supply Chain Management (SCM), and Human Capital Management. Given the critical role of SAP ERP, maintaining the stability, security, and efficiency of these ERP […]
Observing Agentic AI workloads using Amazon CloudWatch agent
Introduction As the adoption of agentic AI applications continues to grow, ensuring the reliability, performance, and overall observability of these systems becomes increasingly critical. Agentic AI applications, powered by large language models (LLM) and integrated with various data sources and APIs, can quickly become complex, making it challenging to gain visibility into their inner workings […]
Key Governance, Risk, and Compliance Sessions at re:Inforce 2025
We are incredibly excited to see you at AWS re:Inforce, in Philadelphia, Pennsylvania, on June 16-18, 2025. This year’s Governance, Risk, and Compliance track features sessions on automating compliance, enhancing risk visibility, using generative AI for business growth, and maintaining security at scale, including 5 breakout sessions, 8 builder sessions, 7 chalk talks, 2 code […]
Monitor AWS Transit Gateway Flow Logs centrally using Amazon Managed Grafana
As organizations continue to expand their cloud infrastructure by connecting multiple Amazon Virtual Private Clouds (Amazon VPC) across accounts and regions, the complexity of managing their network environment increases. AWS Transit Gateway has emerged as a powerful solution to simplify this complexity by providing a centralized hub for secure communication between Amazon VPCs, on-premises systems, and […]
Know Before You Go – AWS re:Invent 2024 Monitoring and Observability
Planning to join us in Las Vegas from Dec 2 to Dec 6 at AWS re:Invent 2024 and looking to learn more about monitoring and observability? If you are, this blog highlights Cloud Operations sessions that focus on monitoring and observability at re:Invent 2024! Monitoring and Observability allows you to understand the health of your applications and […]