AWS Cloud Operations Blog
Category: Management Tools
Gain visibility of AWS backup activities using Amazon Managed Grafana
AWS Backup is a comprehensive service that simplifies the process of centralizing and automating data protection across various AWS services, both in the cloud and on-premises, all managed seamlessly. Organizations have different requirements and want to track their backup, copy and restore activities across AWS cloud resources. Currently, in order to view status of resource […]
Best practices for analyzing AWS Config recording frequencies
AWS Config tracks configuration changes across your AWS resources and AWS Organizations. AWS Config uses the configuration recorder to detect changes and records them as configuration items (CIs). As your infrastructure grows and becomes more complex, choosing the appropriate recording frequency becomes critical for maintaining operational visibility, meeting compliance requirements, and supporting your security posture. Since the launch of the periodic recording […]
Centralized Multi-Account Application Resilience Assessment Using AWS Resilience Hub
Introduction As organizations scale their cloud environments across multiple AWS accounts and regions, managing and accessing resilience becomes increasingly complex. Traditional approaches of evaluating resilience separately for each workload, account, or region can lead to inefficiencies, inconsistencies, and coverage gaps. This challenge is particularly pronounced in distributed architectures utilizing various Infrastructure as Code (IaC) tools […]
Optimize querying AWS CloudTrail logs with partitioning in Amazon Athena
Organizations leveraging AWS CloudTrail to audit API access encounter a common challenge: CloudTrail data volume grows proportionally with AWS infrastructure expansion. A multi-account AWS organization generating millions of API calls daily can quickly amass terabytes of CloudTrail logs. When security teams conduct incident investigations or account activity audits, querying these logs in Amazon Athena becomes […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 2
In practice: SLO monitoring with CloudWatch Application Signals In the previous post, we’ve shared the basic concepts and benefits of burn rate monitoring. In this post, we, the Amazon Product Search team, will share anecdotes from our migration from an in-house solution to CloudWatch Application Signals, and introduce how we actually implement monitoring and dashboards. […]
Alarming on SLOs in Amazon Search with CloudWatch Application Signals – Part 1
In theory: SLO concepts applied to Amazon Product Search In this series of posts, we will show you how we, the Amazon Product Search team, monitor key systems using Service Level Objectives (SLOs) and share our migration journey from an in-house solution to Amazon CloudWatch Application Signals. Amazon Product Search is a large distributed system […]
Using Amazon Bedrock and Amazon Nova for AI-Powered Incident Response
In today’s cloud-native world, incident response teams face overwhelming challenges. When critical applications fail, engineers must sift through mountains of observability data across multiple services; all while under intense pressure to restore service quickly. This manual correlation process is time-consuming, error-prone, and often delays resolution, resulting in extended outages and frustrated customers. Traditional monitoring tools […]
Launching Amazon CloudWatch generative AI observability (Preview)
As organizations rapidly deploy large language models (LLMs) and generative AI agents to power increasingly intelligent workloads, they struggle to monitor and troubleshoot the complex interactions within their AI applications. Traditional monitoring tools fall short in providing the visibility across components, leading to developers and AI/ML engineers to manually correlate interaction logs or building custom […]
SAP on AWS – Streamlined Operations and Monitoring
SAP ERP (Enterprise Resource Planning) systems are at the core of many enterprises, supporting a wide range of mission-critical processes, including Procure to Pay, Order to Cash, Production Planning, Financial Accounting, Supply Chain Management (SCM), and Human Capital Management. Given the critical role of SAP ERP, maintaining the stability, security, and efficiency of these ERP […]
Automate installing AWS Systems Manager agent on unmanaged Amazon EC2 nodes
Managing a fleet of AWS resources at scale can be challenging. Organizations rely on multiple solutions to automate tasks, collect inventory, patch instances, and maintain security compliance. Organizations need to access instances without opening inbound ports or managing SSH keys. AWS Systems Manager (SSM) simplifies this by serving as a centralized management solution that supports […]