Containers

Tag: Spark

Spark Jobs dashboard

Best practices for running Spark on Amazon EKS

Amazon EKS is becoming a popular choice among AWS customers for scheduling Spark applications on Kubernetes. It’s fully managed but still offers full Kubernetes capabilities for consolidating different workloads and getting a flexible scheduling API to optimize resources consumption. But Kubernetes is complex, and not all data engineers are familiar with how to set up […]

Advertising click-prediction modeling on Amazon EKS

In digital advertising, the ad click-through rate (CTR) model predicts the probability of a click given the ads and context x (for example, shopping query, time of the day, device). The output of a CTR model can be seen as a conditional probability p(y = click|x). A precise estimation of this probability influences our ability […]

Optimizing Spark performance on Kubernetes

Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is a popular open source container management system that provides basic mechanisms for […]