Containers

Best practices for resilience and availability on Amazon ECS

In this post, we explore advanced implementation patterns for building highly available services on Amazon ECS, including idempotency, resilience to transient failures, static stability across Availability Zones, deployment safety, and chaos engineering techniques. The post provides detailed guidance on how these patterns can be implemented when deploying applications on Amazon ECS to ensure maximum resilience and availability.

Canary delivery with Argo Rollout and Amazon VPC Lattice for Amazon EKS

This post explores how to implement progressive delivery using Amazon VPC Lattice, Amazon CloudWatch Synthetics, and Argo Rollouts for canary deployments in Amazon EKS environments. The solution enables gradual traffic shifting between service versions, real-time health monitoring through synthetic tests, and automated rollbacks if issues are detected, providing a comprehensive approach to safe and reliable application updates.

Simplify network connectivity using Tailscale with Amazon EKS Hybrid Nodes

This post guides readers through integrating Tailscale with Amazon EKS Hybrid Nodes to simplify and secure network connectivity between on-premises infrastructure and AWS. The integration enables encrypted point-to-point connections using the WireGuard protocol, creating a peer-to-peer mesh network that streamlines the network architecture needed for EKS Hybrid Nodes.

Testing network resilience of AWS Fargate workloads on Amazon ECS using AWS Fault Injection Service

In this post, we demonstrate how to test network resilience of AWS Fargate workloads on Amazon ECS using AWS Fault Injection Service’s new network fault injection capabilities, including network latency, blackhole, and packet loss experiments. Through a sample three-tier application architecture, we show how to conduct controlled chaos engineering experiments to validate application behavior during network disruptions and improve system resilience.

Streamline service-to-service communication during deployments with Amazon ECS Service Connect

When deploying containerized microservices, maintaining reliable service discovery and efficient routing during updates presents significant challenges. Traditional blue/green deployment approaches rely heavily on load balancer for traffic management, which can become complex when dealing with container-based service-to-service communication. This complexity increases the possibility of service disruption and makes it difficult to test new versions in […]

Scaling beyond IPv4: integrating IPv6 Amazon EKS clusters into existing Istio Service Mesh

Organizations are increasingly adopting IPv6 for their Amazon Elastic Kubernetes Service (Amazon EKS) deployments, driven by three key factors: depletion of private IPv4 addresses, the need to streamline or eliminate overlay networks, and improved network security requirements on Amazon Web Services (AWS). In IPv6-enabled EKS clusters, each pod receives a unique IPv6 address from the […]

Centralized Amazon ECS task logging with Amazon OpenSearch

As enterprises continue to adopt containerized workloads, the need for robust and scalable logging solutions has become increasingly important. Logging is a crucial element in monitoring and troubleshooting distributed applications, especially in modern containerized environments such as those deployed on Amazon Elastic Container Service (Amazon ECS). As microservices architectures grow in complexity, managing logs across multiple […]

Deep dive into cluster networking for Amazon EKS Hybrid Nodes

In this post, we dive deep into cluster networking configurations for Amazon EKS Hybrid Nodes, exploring different Container Network Interface (CNI) options and load balancing solutions to meet various networking requirements. The post demonstrates how to implement BGP routing with Cilium CNI, static routing with Calico CNI, and set up both on-premises load balancing using MetalLB and external load balancing using AWS Load Balancer Controller.

UTH - Amazon EKS ultra scale clusters featured image

Under the hood: Amazon EKS ultra scale clusters

This post was co-authored by Shyam Jeedigunta, Principal Engineer, Amazon EKS; Apoorva Kulkarni, Sr. Specialist Solutions Architect, Containers and Raghav Tripathi, Sr. Software Dev Manager, Amazon EKS. Today, Amazon Elastic Kubernetes Service (Amazon EKS) announced support for clusters with up to 100,000 nodes. With Amazon EC2’s new generation accelerated computing instance types, this translates to […]

Featured image: Amazon EKS 100K nodes per cluster

Amazon EKS enables ultra scale AI/ML workloads with support for 100K nodes per cluster

We’re excited to announce that Amazon Elastic Kubernetes Service (Amazon EKS) now supports up to 100,000 worker nodes in a single cluster, enabling customers to scale up to 1.6 million AWS Trainium accelerators or 800K NVIDIA GPUs to train and run the largest AI/ML models. This capability empowers customers to pursue their most ambitious AI […]