Containers

Tag: Spark on EKS

Spark on Amazon EKS networking – Part 2

This post was co-authored by James Fogel, Staff Software Engineer on the Cloud Architecture Team at Pinterest Part 2: Spark on EKS network design at scale Introduction In this two-part series, my counterpart, James Fogel (Staff Cloud Architect at Pinterest), and I share Pinterest’s journey designing and implementing their networking topology for running large-scale Spark […]

Spark on Amazon EKS networking – Part 1

This post was co-authored by James Fogel, Staff Software Engineer on the Cloud Architecture Team at Pinterest Part 1: Design process for Amazon EKS networking at scale Introduction Pinterest is a platform that helps inspire people to live a life they love. Big data and machine learning (ML) are core to Pinterest’s platform and product, […]

Run Spark-RAPIDS ML workloads with GPUs on Amazon EMR on EKS

Introduction Apache Spark revolutionized big data processing with its distributed computing capabilities, which enabled efficient data processing at scale. It offers the flexibility to run on traditional Central Processing Unit (CPUs) as well as specialized Graphic Processing Units (GPUs), which provides distinct advantages for various workloads. As the demand for faster and more efficient machine […]

Dynamic Spark Scaling on Amazon EKS with Argo Workflows and Events

Introduction Kubernetes has gained widespread adoption in the field of data processing because of its ability to package and deploy applications as containers with all required dependencies, as well as its support for running data frameworks. This makes it easy for developers to run their Data Analytics/Machine Learning (ML) applications within a Kubernetes cluster and […]