AWS Cloud Operations Blog

Introducing CloudWatch Lambda Insights

CloudWatch Lambda Insights is a monitoring and troubleshooting solution for serverless applications running on AWS Lambda. The solution collects, aggregates, and summarizes system-level metrics including CPU time, memory, disk, and network. It also collects, aggregates, and summarizes diagnostic information such as cold starts and Lambda worker shutdowns to help you isolate issues with your Lambda […]

Handling Region parity with infrastructure as code

Handling Region parity with infrastructure as code

AWS CloudFormation allows you to create and manage resources with templates. AWS provides a number of Regions where its services and features are available. Although it can be beneficial to deploy the same AWS CloudFormation template in multiple Regions, customers who operate in multiple Regions face challenges due to parity differences among services and their […]

Improve governance and business agility using AWS Management and Governance videos – part 2

This blog post highlights newly published videos on the AWS Management and Governance YouTube channel that help you enable, provision, and operate your AWS environments effectively. The first part of this blog series was published last spring. The objective of these video-based, hands-on solutions is to enable you to innovate faster while maintaining control over […]

Customizing account configuration with AWS Control Tower lifecycle events

Customizing account configuration with AWS Control Tower lifecycle events

In this blog post, we show how to customize the networking configuration in an AWS account. For example by deleting the default VPCs in all AWS Regions, using AWS Resource Access Manager to share the appropriate VPC subnets and using AWS Firewall Manager to apply security groups to VPCs in the account.

Using AWS Systems Manager OpsCenter and AWS Config for compliance monitoring

In this post, I show how AWS Systems Manager OpsCenter can be used to centrally record and mitigate alerts from AWS Config.  When AWS Config detects a resource that is out of compliance, an OpsItem is created.  This OpsItem is used to track details of the noncompliant resource, record investigative actions, and provide access to […]

Manage Control Tower life cycle actions intelligently using AWS Service Catalog, AWS Config, Amazon DynamoDB and AWS CloudFormation

As customers create and manage multi-account AWS environments, cloud administrators need to process where each account can apply configuration autonomously from a centralize configuration repository. Some of the customers I work with use AWS Control Tower to manage a multi account environment. Administrators use AWS Control Tower to create organization units for account grouping and […]

Distributed Tracing using AWS Distro for OpenTelemetry

More and more applications are being developed using serverless architectures with multiple microservices. Customers use managed AWS services including AWS Lambda, Amazon ECS and Amazon EKS running on Amazon Elastic Compute Cloud (EC2) and AWS Fargate for running their code along with services like Amazon API Gateway, Amazon SNS, Amazon SQS, Amazon DynamoDB, Amazon S3, and others. Developers use multiple […]

Communicate monitoring information by sharing Amazon CloudWatch dashboards

Amazon CloudWatch provides you with data and actionable insights to monitor the health and performance of your infrastructure and applications hosted on AWS and on-premises servers. CloudWatch dashboards and alarms enable you to to rapidly detect performance issues that affect end user experience. CloudWatch has added the ability to share dashboards with users outside of […]

Use AWS Systems Manager Explorer to optimize your compute resources across your AWS Organizations

As a solutions architect with AWS, I work with customers to right-size their Amazon Elastic Compute Cloud (EC2) instances to achieve a balance between performance and cost. Optimization is an iterative task that involves several cycles of making changes, analyzing results, and repeating until you reach a satisfactory state. You need to understand the details […]