AWS Cloud Operations & Migrations Blog

Category: Amazon EMR

Monitoring Amazon EMR on EKS with Amazon Managed Prometheus and Amazon Managed Grafana

Apache Spark is an open-source lightning-fast cluster computing framework built for distributed data processing. With the combination of Cloud, Spark delivers high performance for both batch and real-time data processing at a petabyte scale. Spark on Kubernetes is supported from Spark 2.3 onwards, and it gained a lot of traction among enterprises for high performance and […]

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Collecting Apache Flink metrics in the Amazon CloudWatch agent

Apache Flink is a distributed stream processing engine. You can run Flink on Amazon EMR as a YARN application. You can view Flink metrics through its web UI, but what if you want to react to them? In this blog post, I’ll show you how to use the CloudWatch agent to collect Flink metrics into […]

EMR Cluster

Using AWS Systems Manager Run Command to submit Spark/Hadoop jobs on Amazon EMR

Many customers use Amazon EMR with Apache Spark to build scalable big data pipelines. For large-scale production pipelines, a common use case is to read complex data from a variety of sources. This data must be transformed to make it useful to downstream applications, such as machine learning pipelines, analytics dashboards, and business reports. Such […]