How can I be sure that manually installed libraries persist in Amazon SageMaker if my lifecycle configuration times out when I try to install the libraries?

Last updated: 2019-10-28

When I try to install additional libraries, my lifecycle configuration scripts run for more than five minutes, which causes the Amazon SageMaker notebook instance to time out. How can I resolve this and be sure that my manually installed libraries persist between Amazon SageMaker notebook instance sessions?

Short Description

If a lifecycle configuration script runs for longer than five minutes, it fails, and the notebook instance is not created or started. There are two ways to resolve this issue:

  • nohup: The nohup command forces the lifecycle configuration script to continue running in the background until the packages are installed. This method is recommended for less technical users, and is more appropriate as a short-term solution.
  • Create a custom, persistent Conda installation on the notebook instance's Amazon Elastic Block Store (Amazon EBS) volume: Run the on-create lifecycle configuration script from the AWS Samples GitHub repository. This script uses Miniconda to create a separate Conda installation on the EBS volume (/home/ec2-user/SageMaker/). Then, run the on-start script to make the custom environments available as a kernel in Jupyter. This method is recommended for more technical users, and it is a better long-term solution.

Resolution

Use one of the following methods to resolve lifecycle configuration timeouts.

nohup

Use the nohup command to force the lifecycle configuration script to continue running in the background even after the five-minute timeout period expires. Example:

#!/bin/bash
set -e
nohup pip install xgboost &

The script stops running after the libraries are installed. You aren't notified when this happens, but you can use the ps command to find out if the script is still running.

Note: You can also use the nohup command if your lifecycle configuration script times out in other scenarios, such as when you download large Amazon Simple Storage Service (Amazon S3) objects.

Create a custom, persistent Conda installation on the notebook instance's EBS volume

1.    Download and run the on-create script to install a custom Conda installation. This script also creates a new Conda environment in the custom Conda installation. You can customize the script to install other packages, such as NumPy or Boto3, in the new Conda environment.

Note: The notebook instance must have internet connectivity to download the Miniconda installer, ipykernel, and any additional custom packages that you specify.

2.    Download and run the on-start script to make the custom environment available as a kernel in Jupyter.

Note: To achieve the desired result, you must use both scripts in the lifecycle configuration.

If you stop and then start your notebook instance, your custom Conda environment and additional packages are still available. You don't have to install them again.