How do I link an Amazon EMR notebook to a Git repository?

2 minute read
0

I want to link my Amazon EMR notebook with a Git repository.

Resolution

Associating Git repositories with Amazon EMR notebooks allows you to save your notebooks in a version-controlled environment. You can associate up to three repositories with a notebook.

To create a new EMR notebook and then associate it with an existing Git repository, do the following:

1.    Create a private subnet in a virtual private cloud (VPC).

2.    Create a NAT gateway.

3.    Update the route table to point to the NAT gateway.

4.    Launch an Amazon EMR cluster in the private subnet. In the Software configuration section, be sure that you select a configuration that includes Apache Spark, Apache Hadoop, and Apache Livy.

5.    When you're waiting for the EMR cluster to reach the WAITING state, add the Git repository. For Git credentials, choose Create a new secret. Be sure that the Username is the alias of the Git account and not the email address. For more information, see Working with aliases.

6.    Create a security group with the following outbound rules:
Rule 1
Type: Custom TCP rule
Protocol: TCP
Port Range: 18888
Destination: ElasticMapReduceEditors-Livy

Rule 2
Type: HTTPS
Protocol: TCP
Port Range: 443
Destination: 0.0.0.0/0

This allows the notebook to reach the internet using the cluster. For more information, see Custom EC2 security group for EMR notebooks when associating notebooks with Git repositories.

7.    Add an inbound rule to the ElasticMapReduceEditors-Livy security group:
Type: Custom TCP rule
Protocol: TCP
Port Range: 18888
Destination: Enter the name of the security group that you created in the previous step.

8.    Modify the service role for EMR notebooks (EMR_Notebooks_DefaultRole) to allow the secretsmanager:GetSecretValue action.

9.    Create an EMR notebook with the following security group settings:
In the Security groups section, select Choose security groups.
For Security groups for master instance, choose ElasticMapReduceEditors-Livy.
For Security groups for notebook instance, choose the security group that you created in step 6.

The Git repository status changes to Linked. You can now use the Git repository in the notebook.


Related information

Associating Git-based repositories with EMR notebooks

EMR notebooks

AWS OFFICIAL
AWS OFFICIALUpdated 2 years ago
2 Comments

It is NOT necessary to use an EMR Cluster in a private subnet in order to use git from EMR Studio. It may however be (probably is?) necessary for the EMR Studio itself to be created in a private subnet (and those two subnets would need to be in the same VPC and be able to talk to each other).

justin
replied a month ago

Thank you for your comment. We'll review and update the Knowledge Center article as needed.

profile pictureAWS
MODERATOR
replied a month ago