Containers
Proactive Amazon EKS monitoring with Amazon CloudWatch Operator and AWS Control Plane metrics
This post explores using the Amazon CloudWatch monitoring, including new Amazon EKS metrics and the CloudWatch Observability Operator, to gain deeper visibility into cluster operations, detect issues, understand bottlenecks, and maintain healthy EKS clusters.
Deep dive: Streamlining GitOps with Amazon EKS capability for Argo CD
In this deep dive, we explore advanced scenarios with Argo CD including hub-and-spoke multi-cluster deployments, native AWS service integrations, multi-tenancy implementation, scaling with advanced Argo CD configurations and integration with CI/CD pipeline.
Expanding container security and choice with Amazon ECR Public
Today, we’re excited to announce that Amazon ECR Public now offers Chainguard Wolfi Images—security-hardened, minimalist base container images that dramatically reduce vulnerabilities in your containerized applications.
Amazon EKS introduces enhanced network policy capabilities
Today, we are excited to announce the expansion of native network policy support in Amazon EKS to include both Admin Policies and Application Network Policies. With these additional policies, Cluster Administrators (e.g. platform or security teams) can set cluster-wide security rules for their clusters to enhance the overall network security for their Kubernetes workloads. In […]
Automate java performance troubleshooting with AI-Powered thread dump analysis on Amazon ECS and EKS
Picture this: your containerized Java application that was running smoothly yesterday is now consuming 90% CPU and barely responding to user requests. Now your customers are experiencing timeouts, and your ops team is under pressure to resolve the issue quickly. When debugging unresponsive applications or excessive CPU consumption, one of the most valuable diagnostic tools […]
Amazon EKS introduces Provisioned Control Plane
Amazon EKS introduces Provisioned Control Plane, a new capability that allows you to pre-allocate control plane capacity for predictable, high-performance Kubernetes operations at scale. In this post, we explore how this enhanced option complements the Standard Control Plane by offering multiple scaling tiers (XL, 2XL, 4XL) with well-defined performance characteristics for API request concurrency, pod scheduling rates, and cluster database size—enabling you to handle demanding workloads like ultra-scale AI training, high-performance computing, and mission-critical applications with confidence.
Amazon EKS Blueprints for CDK: Now supporting Amazon EKS Auto Mode
Amazon EKS Blueprints for CDK now supports EKS Auto Mode, enabling developers to deploy fully managed Kubernetes clusters with minimal configuration while AWS automatically handles infrastructure provisioning, compute scaling, and core add-on management. In this post, we explore how this integration combines EKS Blueprints’ declarative infrastructure-as-code approach with EKS Auto Mode’s hands-off cluster operations, providing three practical deployment patterns—from basic clusters to specialized ARM-based and AI/ML workloads—that let teams focus on application development rather than infrastructure management .
Enhancing and monitoring network performance when running ML Inference on Amazon EKS
In this post, we explore how to enhance and monitor network performance for ML inference workloads running on Amazon EKS using the newly launched Container Network Observability feature. We demonstrate practical use cases through a sample Stable Diffusion image generation workload, showing how platform teams can visualize service communication, analyze traffic patterns, investigate latency issues, and identify network bottlenecks—ultimately improving metrics like inference latency and time to first token.
Data-driven Amazon EKS cost optimization: A practical guide to workload analysis
In this post, we introduce key considerations for optimizing Amazon EKS costs in production environments through detailed workload analysis and comprehensive monitoring. We demonstrate proven best practices to maximize cost savings while maintaining performance and resilience, supported by real-world examples showing how to eliminate resource waste from overprovisioned pods, excessive replica counts, and fragmented node pools.
Troubleshooting containerized workloads with Amazon ECS Events in the AWS console
In this post, we show how you can use the new event capture capability in the Amazon ECS console to automatically collect and analyze operational events without manually configuring EventBridge rules or CloudWatch log groups. We demonstrate how to enable Amazon ECS event capture with a single click and use the integrated query interface to investigate operational scenarios such as task failures, deployments, and resource constraints issues .








