AWS Cloud Operations Blog

Category: Technical How-to

Operations re:Imagined – Know Before You Go – AWS re:Invent 2024

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2024 in Las Vegas from Dec 2 to Dec 6. At this conference, you’ll have the opportunity to attend thought-provoking keynotes, dive deep into our services, and meet with fellow cloud enthusiasts! No matter your level of expertise, we’ll have sessions […]

Know Before You Go – AWS re:Invent 2024 Monitoring and Observability

Planning to join us in Las Vegas from Dec 2 to Dec 6 at AWS re:Invent 2024 and looking to learn more about monitoring and observability? If you are, this blog highlights Cloud Operations sessions that focus on monitoring and observability at re:Invent 2024! Monitoring and Observability allows you to understand the health of your applications and […]

myApplications

Automate creating and onboarding applications with AWS CloudFormation tags and myApplications

Customers operate hundreds of applications and often those applications consist of hundreds to thousands of resources. This can get complex and overwhelming having to monitor and manage individual resources and identifying what resources are tied to an application while making sure their applications are available, secure, cost-optimized, and performing optimally. The underlying concept of applications […]

Leveraging existing tagging strategies for Application Operations

Leveraging existing tagging strategies for Application Operations

Customers often spend time finding and managing individual resources within their applications. They need to find various applications, manage and perform application tasks, and monitor resources during different stages of the application lifecycle. Customers usually have hundreds to thousands of resources within even a single AWS account. This requires navigating across multiple AWS services pages […]

Introducing AWS Fault Injection Service Actions to Inject Chaos in Lambda functions

Usage of serverless technology in regulated industries like financial services is growing. This growth demands robust resilience validation. Chaos engineering for Serverless has become crucial for ensuring reliable and available serverless applications. By purposefully injecting failures and stresses into serverless components, teams can uncover hidden weaknesses and validate the fault tolerance of their systems. Previously, […]

Streamlining the Correction of Errors process using Amazon Bedrock

Generative AI can streamline the Correction of Errors process, saving time and resources. By using generative AI to leverage large language models, combined with the Correction of Errors process, businesses can expedite the identification and documentation of the cause of errors, while saving time and resources. Purpose and set-up The purpose of this blog is […]

Exploring AWS Config data using Amazon Athena and Amazon Managed Grafana

This post is co-written with Jacob Rickerd, Principal Security Engineer at Attentive. The post walks through an example dashboard that Attentive, an AI-powered mobile marketing platform, uses for resource inventory, serving as a starting point for you to build comprehensive dashboards tailored to your environment and tag policies. Attentive is the AI-powered SMS and email […]

Enhanced dashboard, latency suggestions in Amazon CloudWatch Internet Monitor

Amazon CloudWatch Internet Monitor provides near-continuous internet measurements for your internet traffic, including availability and performance metrics, tailored to your specific workload footprint on AWS. With Internet Monitor, you can get insights into average internet performance metrics over time, as well as get alerts for issues (health events). You’re notified about events that impact your end […]

Centrally detect and investigate security findings with AWS Organizations integrations

Detecting security risks and investigating the corresponding findings is essential for protecting your AWS environment from potential threats, ensuring the confidentiality, integrity, and availability of your data and resources for your business needs. As shown in Image 1, effective incident response follows a systematic approach of identifying, detecting, investigating, prioritizing, and resolving security findings. By analyzing […]

Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers

Managing and operating monitoring systems for containerized applications can be a significant operational burden for customers such as metrics collection. As container environments scale, customers have to split metric collection across multiple collectors, right-size the collectors to handle peak loads, and continuously manage, patch, secure, and operationalize these collectors. This overhead can detract from an […]