How can I install Python packages to a Conda environment on an Amazon SageMaker notebook instance?

Last updated: 2020-10-08

I want to install Python packages to a specific Conda environment, check installed package versions, or create a persistent Conda environment.


Install Python packages to a specific Conda environment

If you use pip or Conda to install Python libraries on the terminal without specifying the correct Conda environment, you get a ModuleNotFoundError when importing that Python package to your running notebook. This is because you're not installing the Python packages in the correct Conda environment. To install the Python packages in the correct Conda environment, activate the environment before running pip install or conda install from the terminal. Example:

sh-4.2$ source activate python3
(python3) sh-4.2$ pip install theano
(python3) sh-4.2$ source deactivate
(JupyterSystemEnv) sh-4.2$

To run this command in a notebook cell, add an exclamation point ("!") at the beginning of the command. This forces the command to run as a shell command from the notebook and assures that the package is installed in the current Jupyter kernel. Example:

import sys
!conda install -y --prefix {sys.prefix} theano

Note: When you run conda install in a notebook cell, you can't enter an interactive response. To install packages in a notebook cell using Conda, you must explicitly pass -y. Otherwise, the command hangs and waits for user confirmation.

Or, use pip install:

import sys
!{sys.executable} -m pip install theano

Sometimes, pip might fail to install some of the package's dependencies. When this happens, use Conda to install packages instead of pip. Conda verifies that all required components are satisfied before installing the packages. For more information, see Understanding Conda and pip in the Conda documentation.

Other useful commands

To see the pre-built Conda environment, run either of the following commands in the notebook instance terminal:

$ conda env list
$ conda info --envs

Example output:

# conda environments:
base                     /home/ec2-user/anaconda3
JupyterSystemEnv      *  /home/ec2-user/anaconda3/envs/JupyterSystemEnv
R                        /home/ec2-user/anaconda3/envs/R
amazonei_mxnet_p27       /home/ec2-user/anaconda3/envs/amazonei_mxnet_p27
amazonei_mxnet_p36       /home/ec2-user/anaconda3/envs/amazonei_mxnet_p36
amazonei_tensorflow_p27     /home/ec2-user/anaconda3/envs/amazonei_tensorflow_p27
amazonei_tensorflow_p36     /home/ec2-user/anaconda3/envs/amazonei_tensorflow_p36
chainer_p27              /home/ec2-user/anaconda3/envs/chainer_p27
chainer_p36              /home/ec2-user/anaconda3/envs/chainer_p36
mxnet_p27                /home/ec2-user/anaconda3/envs/mxnet_p27
mxnet_p36                /home/ec2-user/anaconda3/envs/mxnet_p36
python2                  /home/ec2-user/anaconda3/envs/python2
python3                  /home/ec2-user/anaconda3/envs/python3
pytorch_p27              /home/ec2-user/anaconda3/envs/pytorch_p27
pytorch_p36              /home/ec2-user/anaconda3/envs/pytorch_p36
tensorflow_p27           /home/ec2-user/anaconda3/envs/tensorflow_p27
tensorflow_p36           /home/ec2-user/anaconda3/envs/tensorflow_p36

To see the kernels that are installed on the notebook:

sh-4.2$ ipython kernelspec list

To check the version of a package that's installed in a Conda environment, run this command in the notebook instance terminal:

(python3) sh-4.2$ pip freeze | grep pandas

Or, check the package version in the notebook cell:

import pandas as pd

Create a persistent Conda environment

When you stop a notebook, SageMaker terminates the notebook's Amazon Elastic Compute Cloud (Amazon EC2) instance. Packages that are installed in the Conda environment don't persist between sessions. The /home/ec2-user/SageMaker directory is the only path that persists between notebook instance sessions. This is the directory for the notebook's Amazon Elastic Block Store (Amazon EBS) volume. If you want your libraries to persist between sessions, see How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries?