AWS Cloud Operations Blog

Category: Technical How-to

Analyzing Amazon Lex conversation log data with Amazon Managed Grafana

To support business and internal processes, organizations are increasing their use of conversational interfaces. They offer opportunities for more availability, improved service levels, and reduced costs. As these conversational services become more important, so, does the need to monitor performance and effectiveness of these interfaces with analytics and dashboards. This analysis is used to drive […]

Know Before You Go – AWS re:Invent 2023 Cloud Governance and Compliance

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023, in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

Creating a correction of errors document

This blog post will walk you through an example of creating a Correction of Errors (COE) document. At Amazon, operational excellence is in our DNA. One best practice that we have learned at Amazon is to have a standard mechanism for post-incident analysis. The COE process facilitates learning from an event to avoid reoccurrences in […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Know Before You Go — AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

Know Before You Go – AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023 in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

Scaling GitHub usage with AWS

Introduction Customers that migrate on-premises enterprise applications to AWS often look for guidance on how to migrate GitHub to AWS. Customers find it challenging to scale as they are constrained by on premises GitHub infrastructure. Organisations that run Github on AWS can get up and running quickly. GitHub on AWS enables teams to collaborate efficiently […]

Self-service Account Provisioning Using AWS Service Management Connector for ServiceNow

Many customers are looking to adopt a multi-account strategy within their AWS environment. This allows customers to isolate their workloads into different environments including test, dev, and production in addition to separating workloads based on regulatory requirements. As customers scale their multi-account environments, one strategy to increase agility is to offer business units their own […]

How to download your AWS Resilience Hub assessment results

AWS Resilience Hub provides a central place to define, validate, and track the resilience of your application on AWS. It can help in assessing impact of every application change on resiliency by automatically running the assessment on a daily basis or as part of CI/CD pipeline. With AWS Resilience Hub, you can easily create resiliency […]

Observe dynamic sites with Amazon CloudWatch Synthetics and AWS Systems Manager Parameter Store

Observe dynamic sites with Amazon CloudWatch Synthetics and AWS Systems Manager Parameter Store

Overview Maintaining and improving end user experience is key and as your business grows, the number of endpoints you need to observe can grow quickly. It can become more challenging and time consuming to build multiple canaries to observe them. This solution is designed to show how you can use a consistent and automated approach […]

Centralize image administration for virtual machines and containers using EC2 Image Builder

Customers may have different processes for image building across virtual machines, containers, or both. This variation in processes introduces operational overhead in managing images, including the initial configuration and the ongoing updates. From the AWS Well-Architected Operational Excellence Pillar, section “Document and share lessons learned”, these images should be standardized, configured with the latest patches, […]