AWS Cloud Operations Blog

Category: Developer Tools

Service Catalog engine

Developing an AWS Service Catalog self-managed engine for governance

AWS Service Catalog lets you centrally manage your cloud resources to achieve governance at scale of your Infrastructure as Code (IaC) templates. AWS Service Catalog supports AWS CloudFormation natively and allows customers to use other IaC such as Terraform Community and Terraform Cloud via Service Catalog reference engine. We often hear customers asking how to […]

Enabling Self Service for Cloud Custodian policies on AWS using AWS Service Catalog

Customers are increasingly seeking tools and solutions that can help them achieve their desired outcomes more efficiently and effectively. In the context of cloud management, the need for self-service capabilities has become more pronounced as organizations strive to optimize their cloud resources, improve security, and enhance their overall cloud operations. AWS Service Catalog offers the […]

Alarm Context Tool Architecture Diagram

Respond to CloudWatch Alarms with Amazon Bedrock Insights

Overview When operating complex, distributed systems in the cloud, quickly identifying the root cause of issues and resolving incidents can be a daunting task. Troubleshooting often involves sifting through metrics, logs, and traces from multiple AWS services, making it challenging to gain a comprehensive understanding of the problem. So how can you streamline this process […]

Testing and debugging Amazon CloudWatch Synthetics canary locally

Introduction Amazon CloudWatch Synthetics canaries are scripts that monitor your endpoints and APIs by simulating the actions of a user. These canaries run on a schedule, check the availability and latency of your applications, and alert you when there are issues. Canary scripts are written in Node.js and Python, and they run inside an AWS […]

Using the unified CloudWatch Agent to send traces to AWS X-Ray

Today, applications are more distributed than ever before and they no longer run in isolation. This is especially the case when utilizing  Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS). A distributed workload or system is one that encompasses multiple small independent components, all working together to complete a task or job. […]

Unlock Faster Releases with AWS AppConfig: The Secret Weapon for Your CI/CD Strategy

Striking a Balance Between Reliability and Agility in Cloud Operations The IT operation team of an enterprise serves as the first line of defense against potential business disruptions. They operate 24/7, acts as a hub, continuously monitor and manage the IT environment. The operation team handles and prioritizes critical IT incidents to minimize downtime and […]

Analyze AWS Microservices architecture to identify and address performance issues

Amazon Payment Services (APS) is a payment service provider in the Middle East and North Africa. With its secure and seamless payment experience, it empowers businesses to build their online presence. Amazon Payment Services is based on a broad and complex microservice based architecture that are dependent on multiple AWS services, including Amazon Elastic Compute […]

Streamline Platform Engineering using AWS CodeStar Connections with AWS Service Catalog

Introduction AWS Service Catalog and AWS CloudFormation now support Git-sync capabilities to allow Platform Engineers to streamline their DevOps processes by keeping their Infrastructure as Code (IaC) templates in their source control libraries like GitHub and BitBucket. These enhancements help Platform Engineers to more effectively create, version, and manage their Well-Architected patterns with application teams […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Customers running microservice-based workloads in a serverless environment frequently have issues with troubleshooting incidents as the data they need can be distributed across hundreds or thousands of components. In this blog post, I will demonstrate how you can reduce the mean time to resolution (MTTR, or the average time it takes to repair or mitigate […]