AWS Cloud Operations Blog

Getting insights from Amazon Managed Service for Prometheus using natural language powered by Amazon Bedrock

As applications scale, customers need more automated practices to maintain application availability and reduce the time and effort spent detecting, debugging, and resolving operational issues. Organizations allocate money and developer time to deploy and manage various monitoring tools, while also dedicating considerable effort to training teams on their usage. When issues arise, operators navigate through […]

How Amazon CloudWatch Logs Data Protection can help detect and protect sensitive log data

Customer applications running on Amazon Web Services (AWS) often require handling sensitive data such as personally identifiable information (PII) or protected health information (PHI). As a result, sensitive log data can be intentionally or unintentionally logged as part of an application’s observability data. While comprehensive logging is important for application troubleshooting, monitoring and forensics, any […]

Blog Featured Image

Visualize AWS Systems Manager Patch Manager information using Amazon QuickSight

In this blog post, learn how to build an Amazon QuickSight dashboard to visualize critical patch and inventory information to speed up MTTR. Also, you can use filters to search for a specific AWS Account, specific AWS Region, Amazon Elastic Compute Cloud (Amazon EC2) name, or check installed/missed packages. You want to visualize system patching […]

Leveraging AWS CloudTrail Insights for Proactive API Monitoring and Cost Optimization

Leveraging AWS CloudTrail Insights for Proactive API Monitoring and Cost Optimization

AWS CloudTrail Insights is a powerful feature within AWS CloudTrail that helps organizations identify and respond to unusual operational activity in their AWS accounts. This includes identifying spikes in resource provisioning, bursts of IAM actions, or gaps in periodic maintenance activity. CloudTrail Insights continuously analyzes CloudTrail management events from trails and event data stores, establishing […]

Assess Resilience at Scale by using Amazon QuickSight and Amazon Resilience Hub

AWS Resilience Hub helps you to manage and improve the resilience posture of your applications on AWS. It enables you to define your resilience goals, assess your resilience posture against those goals, and implement recommendations for improvement based on the AWS Well-Architected Framework. This benefits individual teams that want to assess their applications. However, for […]

Using Generative AI to Gain Insights into CloudWatch Logs

Have you ever been investigating a problem and opened up a log file and thought “I have no idea what I am looking at. If only I could get a summary of the data.” Observability and log data play an important role in maintaining operational excellence and ensuring the reliability of your applications and services. […]

AWS named as a Challenger in the 2024 Gartner Magic Quadrant for Observability Platforms

AWS has been named as a Challenger in the 2024 Gartner Magic Quadrant for Observability Platforms, previously known as Gartner Application Performance Monitoring (APM) and Observability Magic Quadrant. This report assesses vendors based on their Ability to Execute and Completeness of Vision. Compared to the previous year, AWS has moved up higher on the Ability […]

How Merck Automated AWS Elastic Disaster Recovery Initialization and Monitoring

Blog is guest authored by Nasia Ullas of MSD. Enhancing the resilience and productivity of manufacturing processes is essential for pharmaceutical companies to meet business continuity objectives and innovate continuously. Merck & Co., Inc., also known as MSD outside of the United States and Canada, a global bio-pharmaceutical company, mitigated resilience challenges by adopting AWS […]

Deploy AWS Systems Manager Quick Setup programmatically across your AWS Organization

AWS Systems Manager Quick Setup simplifies setting up AWS services, including Systems Manager, by automating common or recommended tasks in your AWS Organization across AWS accounts and Regions. These tasks include, creating required AWS Identity and Access Management (IAM) instance profile roles and setting up operational best practices, such as periodic patch scans and inventory […]

Automate Standard Operating Procedures (SOPs) execution with AWS Resilience Hub

AWS Resilience Hub is a central location in the AWS Management Console for you to manage and improve the resilience posture of your applications on AWS. AWS Resilience Hub enables you to define your resilience goals, assess your resilience posture against those goals, and implement recommendations for improvement based on the AWS Well-Architected Framework. AWS […]