How can I access an EMR cluster through an application if the cluster is in a private subnet?

Last updated: 2020-05-11

I want to use an application, such as Apache Livy, to access and submit work to an Amazon EMR cluster that's in a private subnet.

Short Description

Create an Application Load Balancer in a public subnet. Set the target of the Application Load Balancer to the public IP address of the master node. Doing this allows you to connect to the EMR cluster that's in a private subnet and then submit jobs to the client using REST APIs.

Resolution

Note: The following might not work if you launch the cluster with Kerberos, or if you enable SSL for Livy.

  1. Open the Amazon Elastic Compute Cloud (Amazon EC2) console.
  2. In the navigation pane, under LOAD BALANCING, choose Load Balancers.
  3. Choose Create Load Balancer.
  4. On the Select load balancer type page, under Application Load Balancer, choose Create.
  5. On the Step 1: Configure Load Balancer page:
    For Scheme, choose internet-facing.
    For Listeners, use the default options (HTTP and port 80)
    For VPC, choose the VPC that the EMR cluster is in.
    For Availability Zones, choose two subnets. Be sure that one of them is the subnet that the EMR cluster is in (the private subnet).
  6. Choose Next: Configure Security Settings.
  7. If you created a secure listener in the previous step, complete the Configure Security Settings page. Otherwise, choose Next: Configure Security Groups.
  8. Select the security group or groups for the Application Load Balancer. Remember, this is an internet-facing Application Load Balancer. It's a best practice to use a security group that limits incoming requests a specific IP address or IP address range.
  9. Choose Next: Configure Routing.
  10. On the Step 4: Configure Routing page:
    For Target type, choose ip.
    For Protocol, choose HTTP.
    For Port, enter the port of the client's web UI. For example, for Livy, enter 8998. For more information, see View Web Interfaces Hosted on Amazon EMR Clusters.
    In the Health checks section, for Protocol, choose HTTP.
    For Path, enter /sessions.
  11. Choose Next: Register Targets.
  12. On the Step 5: Register Targets page, for IP, enter the public IP address of the master node. You can find the public IP address of the master node on the Hardware tab of the cluster details page.
  13. Choose Add to list to add the IP address to the To be registered list.
  14. Choose Next: Review, and then choose Create.
  15. When the State changes to active, choose the Listeners tab.
  16. In the Rules column, choose the target group link.
  17. On the Targets tab, confirm that the Registered targets and Availability Zones are healthy.
  18. On the Description tab, choose the link next to Load balancer (the link that's the name of the load balancer). If the Livy web UI appears, the configuration is working—requests are able to reach Livy on the EMR cluster in the private subnet.

You can now submit jobs to the client. For example, the following command submits an Apache Spark job to the Livy server. Replace livyALB-2103017743.us-east-1.elb.amazonaws.com with the DNS name of your Application Load Balancer. You can find the DNS name on the Description tab for the Application Load Balancer.

curl -X POST --data '{"kind": "pyspark"}' -H "Content-Type: application/json" livyALB-2103017743.us-east-1.elb.amazonaws.com/sessions

Did this article help you?

Anything we could improve?


Need more help?