Containers

Streamline your containerized CI/CD with GitLab Runners and Amazon EKS Auto Mode

In this post we demonstrate how using GitLab Runners on EKS Auto Mode, combined with Amazon Elastic Compute Cloud (Amazon EC2) Spot Instances, can deliver enterprise-scale CI/CD capabilities while achieving up to 90% cost reduction when compared to traditional deployment models. This approach not only optimizes operational expenses, but also provides resilient, scalable pipeline execution.

Part 2: Observing and scaling MLOps infrastructure on Amazon EKS

In this post, we focus on observing and scaling ML operations (MLOps) infrastructure on Kubernetes. MLOps platforms running on Amazon EKS provide powerful built-in capabilities for logging, monitoring, and alerting that are essential for maintaining healthy ML systems at scale.

Efficient image and model caching strategies for AI/ML and generative AI workloads on Amazon EKS

This post looks at various options for container image caching, model training, and inferencing workloads. This post also discusses various storage options such as Amazon Simple Storage Service (Amazon S3), FSx for lustre, S3 Express One Zone, and Amazon S3 Connector for PyTorch.

Implementing assurance pipeline for Amazon EKS Platform

This post details how platform engineering teams can build an assurance pipeline for Amazon EKS deployments, incorporating validation frameworks that verify configurations, test infrastructure as code (IaC), assess application resilience, and establish compliance with organizational standards.

Enhance Amazon EKS network security posture with DNS and admin network policies

Amazon Web Services (AWS) announced the availability of DNS-based and Admin network policies for Amazon Elastic Kubernetes Service (EKS) Auto mode and Admin network policies for both EKS Auto mode and EKS on Amazon Elastic Compute Cloud (EC2), providing enhanced capabilities to secure network traffic both within your clusters and to external endpoints. In this post, we explore practical use cases that demonstrate how these policies solve real-world challenges and remove the need to rely on third-party software across different deployment scenarios, from securing access to external services to hybrid cloud integration and multi-tenant environments.

Proactive Amazon EKS monitoring with Amazon CloudWatch Operator and AWS Control Plane metrics

This post explores using the Amazon CloudWatch monitoring, including new Amazon EKS metrics and the CloudWatch Observability Operator, to gain deeper visibility into cluster operations, detect issues, understand bottlenecks, and maintain healthy EKS clusters.

Deep dive: Streamlining GitOps with Amazon EKS capability for Argo CD

In this deep dive, we explore advanced scenarios with Argo CD including hub-and-spoke multi-cluster deployments, native AWS service integrations, multi-tenancy implementation, scaling with advanced Argo CD configurations and integration with CI/CD pipeline.

Expanding container security and choice with Amazon ECR Public

Today, we’re excited to announce that Amazon ECR Public now offers Chainguard Wolfi Images—security-hardened, minimalist base container images that dramatically reduce vulnerabilities in your containerized applications.

Amazon EKS introduces enhanced network policy capabilities

Today, we are excited to announce the expansion of native network policy support in Amazon EKS to include both Admin Policies and Application Network Policies. With these additional policies, Cluster Administrators (e.g. platform or security teams) can set cluster-wide security rules for their clusters to enhance the overall network security for their Kubernetes workloads. In […]

Automate java performance troubleshooting with AI-Powered thread dump analysis on Amazon ECS and EKS

Picture this: your containerized Java application that was running smoothly yesterday is now consuming 90% CPU and barely responding to user requests. Now your customers are experiencing timeouts, and your ops team is under pressure to resolve the issue quickly. When debugging unresponsive applications or excessive CPU consumption, one of the most valuable diagnostic tools […]