AWS Big Data Blog

Author: Suvojit Dasgupta

Suvojit Dasgupta is a Principal Architect at Amazon Web Services. He works with customers to design and build complex data solutions on AWS.

High-performance Remote Shuffle Service on Amazon EMR with Apache Celeborn

In this post, we show how Apache Celeborn resolves this trade-off for Amazon EMR on EKS and Amazon EMR on EC2, improving job reliability while unlocking additional cost savings.

Deploy Apache YuniKorn batch scheduler for Amazon EMR on EKS

This post explores Kubernetes scheduling fundamentals, examines the limitations of the default kube-scheduler for batch workloads, and demonstrates how YuniKorn addresses these challenges. We discuss how to deploy YuniKorn as a custom scheduler for Amazon EMR on EKS, its integration with job submissions, how to configure queues and placement rules, and how to establish resource quotas. We also show these features in action through practical Spark job examples.

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

Apache Hadoop YARN (Yet Another Resource Negotiator) is a cluster resource manager responsible for assigning computational resources (CPU, memory, I/O), and scheduling and monitoring jobs submitted to a Hadoop cluster. This generic framework allows for effective management of cluster resources for distributed data processing frameworks, such as Apache Spark, Apache MapReduce, and Apache Hive. When […]

AWS Big Data Blog

Author: Suvojit Dasgupta

High-performance Remote Shuffle Service on Amazon EMR with Apache Celeborn

Deploy Apache YuniKorn batch scheduler for Amazon EMR on EKS

Configure Hadoop YARN CapacityScheduler on Amazon EMR on Amazon EC2 for multi-tenant heterogeneous workloads

Learn

Resources

Developers

Help