How do I link an Amazon EMR notebook to a Git repository?

Last updated: 2020-09-03

I want to link an Amazon EMR notebook with a Git repository.

Resolution

Associating Git repositories with Amazon EMR notebooks allows you to save your notebooks in a version-controlled environment. You can associate up to three repositories with a notebook.

To create a new EMR notebook and associate it with an existing Git repository:

  1. Create a private subnet in a virtual private cloud (VPC).
  2. Create a NAT gateway
  3. Update the route table to point to the NAT gateway.
  4. Launch an Amazon EMR cluster in the private subnet. In the Software configuration section, be sure that you select a configuration that includes Apache Spark, Apache Hadoop, and Apache Livy.
  5. While you're waiting for the EMR cluster to reach the WAITING state, add the Git repository. For Git credentials, choose Create a new secret. Be sure that the Username is the alias of the Git account, not the email address. For more information, see Working with aliases.
  6. Create a security group with the following two outbound rules:
    Rule 1
    Type: Custom TCP rule
    Protocol: TCP
    Port Range: 18888
    Destination: ElasticMapReduceEditors-Livy
    Rule 2
    Type: HTTPS
    Protocol: TCP
    Port Range: 443
    Destination: 0.0.0.0/0
    This allows the notebook to reach the internet using the cluster. For more information, see Custom EC2 security group for EMR notebooks when associating notebooks with Git repositories.
  7. Add an inbound rule to the ElasticMapReduceEditors-Livy security group:
    Type: Custom TCP rule
    Protocol: TCP
    Port Range: 18888
    Destination: Enter the name of the security group that you created in the previous step.
  8. Modify the service role for EMR notebooks (EMR_Notebooks_DefaultRole) to allow the secretsmanager:GetSecretValue action.
  9. Create an EMR notebook with the following security group settings:
    In the Security groups section, select Choose security groups.
    For Security groups for master instance, choose ElasticMapReduceEditors-Livy.
    For Security groups for notebook instance, choose the security group that you created in step 6.

The Git repository status changes to Linked. You can now use the Git repository in the notebook.


Did this article help?


Do you need billing or technical support?