Containers

Category: Amazon EMR

Run Spark-RAPIDS ML workloads with GPUs on Amazon EMR on EKS

Introduction Apache Spark revolutionized big data processing with its distributed computing capabilities, which enabled efficient data processing at scale. It offers the flexibility to run on traditional Central Processing Unit (CPUs) as well as specialized Graphic Processing Units (GPUs), which provides distinct advantages for various workloads. As the demand for faster and more efficient machine […]

Introducing Data on EKS – Modernize Data Workloads on Amazon EKS

Introduction We are thrilled to introduce Data on EKS (DoEKS), a new open-source project aimed at streamlining and accelerating the process of building, deploying, and scaling data workloads on Amazon Elastic Kubernetes Service (Amazon EKS). With DoEKS, customers get access to a comprehensive range of resources including Infrastructure as Code (IaC) templates, performance benchmark reports, […]

Using Amazon EMR on Amazon EKS for transient EMR clusters

Using Amazon EMR on Amazon EKS for transient EMR clusters

Introduction Many organizations as part of their cloud journey into Amazon Web Services migrate and modernize their ETL (extract-transform-load) batch processing workloads running on on-premises Hadoop clusters to AWS. They often start their journey with the lift and shift approach, by hosting their Hadoop environment on Amazon Elastic Compute Cloud (Amazon EC2) or migrate to […]