Containers
Using Prometheus Metrics in Amazon CloudWatch
Imaya Kumar Jagannathan, Justin Gu, Marc Chéné, and Michael Hausenblas Update 2020-09-08: The feature described in this post is now in GA, see details in the Amazon CloudWatch now monitors Prometheus metrics from Container environments What’s New item. Earlier this week we announced the public beta support for monitoring Prometheus metrics in CloudWatch Container Insights. […]
Introducing multi-architecture container images for Amazon ECR
Containers are a de facto standard in cloud application development and deployment. Publishing software in container images provides developers an integrated packaging solution, bundling software and all required dependencies into a portable image format. This image can then be run anywhere, abstracting away the infrastructure-specific aspects of deployment. However, the promise of running anywhere only […]
Fault tolerant distributed machine learning training with the TorchElastic Controller for Kubernetes
Introduction Kubernetes enables machine learning teams to run training jobs distributed across fleets of powerful GPU instances like Amazon EC2 P3, reducing training time from days to hours. However, distributed training comes with limitations compared to the more traditional microservice based applications typically associated with Kubernetes. Distributed training jobs are not fault tolerant, and a […]
Optimizing Spark performance on Kubernetes
Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is a popular open source container management system that provides basic mechanisms for […]
Under the hood: AWS Fargate data plane
Today, we launched a new platform version (1.4) for AWS Fargate, which bundles a number of new features and capabilities for our customers. You can read more about these features in this blog post. One of the changes we are introducing in platform version 1.4 is replacing Docker Engine with Containerd as Fargate’s container execution […]
AWS Fargate platform versions primer
AWS Fargate is a managed service to run containers. This is an AWS managed service that allows users to launch containers without having to worry about the infrastructure underneath. In another blog post, we explored in detail the new features and the changes we introduced with AWS Fargate platform version 1.4.0. Let’s step back and […]
AWS Fargate launches platform version 1.4.0
AWS Fargate is a managed service to run containers. Fargate allows customers to use Amazon Elastic Container Service (ECS) and Amazon Elastic Kubernetes Service (EKS) to launch applications without the burden of having to deal with the undifferentiated heavy lifting of maintaining, patching, scaling, securing, life-cycling the infrastructure. While Amazon EC2 abstracts away hypervisors and […]
Bottlerocket: a special-purpose container operating system
On March 10, 2020, we introduced Bottlerocket, a new special-purpose operating system designed for hosting Linux containers. In this post, I want to take you through some of the goals we started with, engineering choices we made along the way, and our vision for how the OS will continue to evolve in the future. In […]
Multi-tenant design considerations for Amazon EKS clusters
This post was contributed by Roberto Migli, AWS Solutions Architect. Amazon Elastic Kubernetes Service (Amazon EKS) is used today by thousands of customers to run container applications at scale. One of the common questions that often we hear is: how do we provide a multi-tenant Amazon EKS cluster to our teams? Should I run one cluster, […]
De-mystifying cluster networking for Amazon EKS worker nodes
Running Kubernetes on AWS requires an understanding of both AWS networking configuration and Kubernetes networking requirements. When you use the default Amazon Elastic Kubernetes Service (Amazon EKS) AWS CloudFormation templates to deploy your Amazon Virtual Private Cloud (Amazon VPC) and Amazon EC2 worker nodes, everything typically just works. But small issues in your configuration can result […]







