Containers
Tag: Spark
Best practices for running Spark on Amazon EKS
Amazon EKS is becoming a popular choice among AWS customers for scheduling Spark applications on Kubernetes. It’s fully managed but still offers full Kubernetes capabilities for consolidating different workloads and getting a flexible scheduling API to optimize resources consumption. But Kubernetes is complex, and not all data engineers are familiar with how to set up […]
Advertising click-prediction modeling on Amazon EKS
In digital advertising, the ad click-through rate (CTR) model predicts the probability of a click given the ads and context x (for example, shopping query, time of the day, device). The output of a CTR model can be seen as a conditional probability p(y = click|x). A precise estimation of this probability influences our ability […]
Optimizing Spark performance on Kubernetes
Apache Spark is an open source project that has achieved wide popularity in the analytical space. It is used by well-known big data and machine learning workloads such as streaming, processing wide array of datasets, and ETL, to name a few. Kubernetes is a popular open source container management system that provides basic mechanisms for […]