AWS Cloud Operations Blog
Category: Monitoring and observability
Know Before You Go – AWS re:Invent 2024 Monitoring and Observability
Planning to join us in Las Vegas from Dec 2 to Dec 6 at AWS re:Invent 2024 and looking to learn more about monitoring and observability? If you are, this blog highlights Cloud Operations sessions that focus on monitoring and observability at re:Invent 2024! Monitoring and Observability allows you to understand the health of your applications and […]
How Cigna Implemented a Multi-Region Centralized Alerting System on AWS
This post is co-written with Nicolas Trettel, Cloud Engineering Senior Advisor at Cigna. Monitoring applications and alerting on issues is crucial for building resilient systems. Amazon CloudWatch is a service that monitors applications, responds to performance changes, optimizes resource use, and provides insights into operational health. By collecting data across AWS resources, CloudWatch gives visibility […]
How Stripe architected massive scale observability solution on AWS
This post is co-written with Cody Rioux, Staff Engineer at Stripe and Michael Cowgill, Staff engineer at Stripe Stripe powers online and in-person payment processing and provides financial solutions for businesses of all sizes. Stripe operates a sophisticated microservice environment built on top of AWS. In this blog post we will cover the journey and […]
Sign-in to AWS Console Mobile Application with an AWS Access Portal or third-party IdP URL
AWS customers rely on the AWS Console Mobile Application to monitor, manage, and receive notifications to stay informed about their AWS resources while away from their desktop devices. Customers who use Single-Sign-On (SSO) can face a unique set of challenges while signing into the AWS Console Mobile Application. While SSO can offer enhanced security and […]
Managing access to AWS accounts from Microsoft Teams and Slack at scale using AWS Organizations and AWS Chatbot
Customers use chat collaboration applications like Microsoft Teams and Slack to collaborate and manage their AWS applications. AWS Chatbot is a ChatOps service that enables customers to monitor, troubleshoot issues, and manage AWS applications from chat channels. AWS Chatbot provides autonomy and customizability to DevOps teams operating their AWS environments on the go from chat […]
Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers
Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]
Enable cloud operations workflows with generative AI using Agents for Amazon Bedrock and Amazon CloudWatch Logs
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible […]
Ten features for efficiently managing your AWS applications from Microsoft Teams and Slack using AWS Chatbot
Ten features in AWS Chatbot to help you understand your application health and resolve issues faster from chat channels.
How Amazon CloudWatch Logs Data Protection can help detect and protect sensitive log data
Customer applications running on Amazon Web Services (AWS) often require handling sensitive data such as personally identifiable information (PII) or protected health information (PHI). As a result, sensitive log data can be intentionally or unintentionally logged as part of an application’s observability data. While comprehensive logging is important for application troubleshooting, monitoring and forensics, any […]
Leveraging AWS CloudTrail Insights for Proactive API Monitoring and Cost Optimization
AWS CloudTrail Insights is a powerful feature within AWS CloudTrail that helps organizations identify and respond to unusual operational activity in their AWS accounts. This includes identifying spikes in resource provisioning, bursts of IAM actions, or gaps in periodic maintenance activity. CloudTrail Insights continuously analyzes CloudTrail management events from trails and event data stores, establishing […]