AWS Cloud Operations & Migrations Blog

Category: Best Practices

Know Before You Go: AWS-re-Invent-2023, AWS Management Console

Know Before You Go – AWS re:Invent 2023 | AWS Management Console

New this year, the AWS Customer Experience team has tips to help you enhance your re:Invent experience and learn about various improvements that make AWS even easier to use. Meet us at our kiosks in the AWS Village and be sure to check out the sessions below. Our sessions will cover best practices for managing […]

Know Before You Go – AWS re:Invent 2023 Cloud Governance and Compliance

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023, in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

Monitoring GPU workloads on Amazon EKS using AWS managed open-source services

As machine learning (ML) workloads continue to grow in popularity, many customers are looking to run them on Kubernetes with graphics processing unit (GPU) support. Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training and cost-effective ML inference. Monitoring GPU utilization gives valuable information for researchers working […]

Know Before You Go — AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

Know Before You Go – AWS re:Invent 2023 Monitoring and Observability, and Centralized Operations Management

We are so excited to see you at our annual cloud computing conference, AWS re:Invent 2023 in Las Vegas from Nov 27 to Dec 1. Whether you’re a seasoned re:Invent veteran or a first-timer, the excitement and opportunities of AWS re:Invent never cease to amaze. With a total of 96 sessions covering the solution areas that […]

Unlocking the power: The keys to delivering successful Cloud Migrations

Despite the many benefits of moving to the Cloud, large enterprises frequently struggle to deliver migrations (and the related business transformation) in the planned timeframe. Why?  What are the key factors that ensure a successful migration that becomes an oft-quoted industry benchmark for a Cloud driven transformation; rather than a moribund initiative where a number […]

Using Tag-Based Filtering to Manage AWS Health Monitoring and Alerting at Scale

AWS provides customers regular updates of service notifications and planned activities via e-mail to the root account owners or the operational, security and billing contacts. AWS also provides granular notifications to customers via AWS Health allowing them to fine-tune their alerts on issues relating directly to them. Alongside Health Dashboard’s monitoring capabilities, customers can also […]

Designing a successful cloud migration: top five pitfalls and how to avoid a stall

Stalled cloud migrations can undermine cloud adoption’s business value. It is therefore important to watch out for early warning signs and take timely corrective action. This blog post looks at five big pitfalls every cloud migration leader should be aware of. The good news is you can spot these issues early and mitigate them to […]

Centralize image administration for virtual machines and containers using EC2 Image Builder

Customers may have different processes for image building across virtual machines, containers, or both. This variation in processes introduces operational overhead in managing images, including the initial configuration and the ongoing updates. From the AWS Well-Architected Operational Excellence Pillar, section “Document and share lessons learned”, these images should be standardized, configured with the latest patches, […]

Auto-remediate best practice deviations detected by AWS Trusted Advisor

AWS Trusted Advisor inspects your AWS infrastructure and provides best practice recommendations when opportunities exist to reduce cost, optimize your AWS infrastructure, improve system availability and performance, help close security gaps and monitor service quotas. Trusted Advisor recommendations are based on best practices identified by AWS services experts and learnings from serving thousands of customers […]

How to reduce Istio sidecar metric cardinality with Amazon Managed Service for Prometheus

How to reduce Istio sidecar metric cardinality with Amazon Managed Service for Prometheus

The complexity of distributed systems has grown significantly, making monitoring and observability essential for application and infrastructure reliability. As organizations adopt microservice-based architectures and large-scale distributed systems, they face the challenge of managing an increasing volume of telemetry data, particularly high metric cardinality in systems like Prometheus. To address this, many are turning to service […]