AWS Cloud Operations & Migrations Blog

Category: Management & Governance

Announcing AWS CloudTrail Lake one-year extendable retention pricing option

In 2022 Amazon Web Services (AWS) released AWS CloudTrail Lake, a managed audit and security lake that allows you to aggregate, immutably store, visualize, and query your activity logs for auditing, security investigation, and operational troubleshooting.  Working backwards from our customers we have added capabilities to CloudTrail Lake such as the ability to copy CloudTrail events into […]

Build a Cloud Automation Practice for Operational Excellence: Best Practices from AWS Managed Services

Introduction In today’s fast-paced business environment, organizations are actively pursuing operational excellence to maintain a competitive edge. Automation is a critical foundation for achieving better efficiency, reliability, and scalability in operations. However, integrating automation into cloud practice entails more than simply implementing software or tools. Building a cloud automation practice requires a transformative journey that […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Amazon Connect real-time monitoring using Amazon Managed Grafana and Amazon Timestream

Amazon Connect is an easy-to-use cloud contact center solution that helps companies of any size deliver superior customer service at a lower cost. Connect has many real-time monitoring capabilities. For requirements that go beyond those supported out of the box, Amazon Connect also provides you with data and APIs you can use to implement your […]

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Lowering MTTR with Amazon CloudWatch and AWS X-Ray

Customers running microservice-based workloads in a serverless environment frequently have issues with troubleshooting incidents as the data they need can be distributed across hundreds or thousands of components. In this blog post, I will demonstrate how you can reduce the mean time to resolution (MTTR, or the average time it takes to repair or mitigate […]

Self-service Account Provisioning Using AWS Service Management Connector for ServiceNow

Many customers are looking to adopt a multi-account strategy within their AWS environment. This allows customers to isolate their workloads into different environments including test, dev, and production in addition to separating workloads based on regulatory requirements. As customers scale their multi-account environments, one strategy to increase agility is to offer business units their own […]

How to download your AWS Resilience Hub assessment results

AWS Resilience Hub provides a central place to define, validate, and track the resilience of your application on AWS. It can help in assessing impact of every application change on resiliency by automatically running the assessment on a daily basis or as part of CI/CD pipeline. With AWS Resilience Hub, you can easily create resiliency […]

Using Tag-Based Filtering to Manage AWS Health Monitoring and Alerting at Scale

AWS provides customers regular updates of service notifications and planned activities via e-mail to the root account owners or the operational, security and billing contacts. AWS also provides granular notifications to customers via AWS Health allowing them to fine-tune their alerts on issues relating directly to them. Alongside Health Dashboard’s monitoring capabilities, customers can also […]

Centralize image administration for virtual machines and containers using EC2 Image Builder

Customers may have different processes for image building across virtual machines, containers, or both. This variation in processes introduces operational overhead in managing images, including the initial configuration and the ongoing updates. From the AWS Well-Architected Operational Excellence Pillar, section “Document and share lessons learned”, these images should be standardized, configured with the latest patches, […]

Accelerate End-to-End Application Modernization with AWS App2Container and AWS Migration Hub Refactor Spaces

Accelerate End-to-End Application Modernization with AWS App2Container and AWS Migration Hub Refactor Spaces

This blog post was written with contributions from Gaurav Parashar who is prior AWS Customers often have challenges accelerating the modernization of their applications. The complexity of refactoring a monolith application often provides hurdles in depth of expertise, time and effort. In this blog, we will explore two mechanisms that can help you accelerate your […]