Amazon EMR on AWS Outposts

Why EMR on Outposts?

AWS Outposts bring AWS services, infrastructure, and operating models to virtually any data center, co-location space, or on-premises facility. Amazon EMR is available on AWS Outposts, allowing you to set up, deploy, manage, and scale Apache Hadoop, Apache Hive, Apache Spark, and Presto clusters in your on-premises environments, just as you would in the cloud. Amazon EMR provides cost-efficient capacity in Outposts, while automating time-consuming administration tasks including infrastructure provisioning, cluster setup, configuration, or tuning, freeing you to focus on your applications.

You can quickly and easily create managed EMR clusters on-premises using the same AWS Management Console, APIs, and CLI for EMR. EMR clusters launched in an Outpost will appear in the AWS console just like any other cluster, but will be running in your Outpost.

Benefits

Once your Outpost is set up, you can launch a new EMR cluster on-premises and connect to existing HDFS storage in minutes. This allows you to quickly respond when on-premises systems need additional processing capacity. Adding capacity to on-premises Hadoop and Spark clusters helps meet workload demands in periods of high utilization and maintain SLAs.
If you’re in the process of migrating data and Apache Hadoop workloads to the cloud and want to start using EMR before your migration is complete, you can use AWS Outposts to launch EMR clusters on-premises that connect to your existing HDFS storage. You can then gradually migrate your data to Amazon S3 as part of an evolution to a cloud architecture.
Apache Hadoop, Apache Hive, Apache Spark, and Presto are commonly used to process, transform, and analyze data that is part of a larger data architecture. For data that needs to remain on-premises for governance, compliance, or other reasons, you can use EMR to deploy and run applications like Apache Hadoop and Apache Spark on-premises, close to your data. This reduces the need to move large amounts of on-premises data to the cloud, reducing the overall time needed to process that data.