Containers
Tag: Spark
Run Spark-RAPIDS ML workloads with GPUs on Amazon EMR on EKS
Introduction Apache Spark revolutionized big data processing with its distributed computing capabilities, which enabled efficient data processing at scale. It offers the flexibility to run on traditional Central Processing Unit (CPUs) as well as specialized Graphic Processing Units (GPUs), which provides distinct advantages for various workloads. As the demand for faster and more efficient machine […]
Best practices for running Spark on Amazon EKS
Amazon EKS is becoming a popular choice among AWS customers for scheduling Spark applications on Kubernetes. It’s fully managed but still offers full Kubernetes capabilities for consolidating different workloads and getting a flexible scheduling API to optimize resources consumption. But Kubernetes is complex, and not all data engineers are familiar with how to set up […]
Advertising click-prediction modeling on Amazon EKS
In digital advertising, the ad click-through rate (CTR) model predicts the probability of a click given the ads and context x (for example, shopping query, time of the day, device). The output of a CTR model can be seen as a conditional probability p(y = click|x). A precise estimation of this probability influences our ability […]
Optimizing Spark performance on Kubernetes
Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is a popular open source container management system that provides basic mechanisms for […]