How can I install third-party libraries on all nodes of a running Amazon EMR cluster?

Use AWS Systems Manager to connect to the secondary nodes and then run a bash script to download third-party libraries.

Note: It's not possible to use bootstrap actions to change a running cluster. Steps aren't an option either, because steps run only on the master node. You can manually log in to each node and change that node, but this action is tedious when there are many nodes in the cluster.

Create an instance profile for Systems Manager

1.    Open the IAM console.

2.    In the navigation pane, choose Roles, and then choose the role that's associated with the EC2 instances in your EMR cluster. By default, this role is named EMR_EC2_DefaultRole.

3.    On the Permissions tab, choose Attach policy.

4.    On the Attach policy page, select the check box next to AmazonEC2RoleforSSM, and then choose Attach policy.

Install libraries on the master node

1.    Connect to the master node using SSH.

2.    Use a bash script saved to Amazon Simple Storage Service (Amazon S3) to install libraries on the master node. The script in the following example uses easy_install-3.4 to install pip. Then the script uses pip to install paramiko, nltk, scipy, scikit-learn, and pandas for the Python 3 kernel:

#!/bin/bash

sudo easy_install-3.4 pip
sudo /usr/local/bin/pip3 install paramiko nltk scipy scikit-learn pandas

Install libraries on the core and task nodes

1.    Create a Python script to install libraries on the core and task nodes. For an example script, see Example Installing Libraries on Core Nodes of a Running Cluster in Using Libraries and Installing Additional Libraries.

2.    Save the script to your local machine or an Amazon Elastic Compute Cloud (Amazon EC2) instance.

3.    Run a command similar to the following to execute the script. The script take two arguments: your cluster ID and the S3 location of the bash script that you created earlier.

python sample.py j-1K48XXXXXXHCB s3://mybucket/script-ssm.sh

Note: The IAM user or role that executes the script must have appropriate permissions for Systems Manager. For more information, see Configure User Access for Systems Manager.


Did this page help you? Yes | No

Back to the AWS Support Knowledge Center

Need help? Visit the AWS Support Center

Published: 2019-01-30