How can I specify proxy settings for Jupyter notebooks running on an Amazon SageMaker notebook instance?

Last updated: 2019-09-24

I'm trying to use a lifecycle configuration to configure proxy settings on a new Jupyter notebook or on the shell environment of an existing notebook. However, Jupyter is not picking up the settings. How can I specify proxy settings for Jupyter notebooks running on my Amazon SageMaker notebook instance and make sure that Jupyter uses the settings?

Short Description

Jupyter notebooks won't pick up proxy settings from a lifecycle configuration unless you specify the proxy settings in the iPython notebook server environment.

To configure the iPython notebook server profile, create a Python script and save it as 00-startup.py in /home/ec2-user/.ipython/profile_default/startup. The /home/ec2-user/.ipython/profile_default/startup path is created when you open a Jupyter notebook on an Amazon SageMaker notebook instance. Juypter runs scripts in lexicographical order, so when a script file name starts with "00", Jupyter runs it first.

Note: Jupyter notebooks and the shell are run as separate processes. This means that proxy settings applied through 00-startup.py aren't passed to the shell. Additionally, the default shell that's created when you open a terminal in Jupyter is a non-login shell, also known as an interactive shell. To apply proxy settings to the shell environment with a lifecycle configuration, change the default interactive shell to a login shell such as bash and put the proxy settings in /home/ec2-user/.profile.

Resolution

Download and run the proxy-for-jupyter lifecycle configuration script from the AWS Labs GitHub repository:

  • To set up a proxy for both a Jupyter notebook and the shell, leave all echo statements in the script.
  • To set up a proxy for a Jupyter notebook only, remove the first four echo statements from the script.
  • To set up a proxy for the shell only, remove the last four echo statements from the script.

After the script executes, you can see the proxy settings in the Jupyter notebook and the terminal. To confirm that the proxy works as expected:

1.    Open a terminal window on your Amazon SageMaker notebook instance and then run the following command to test the proxy connection:

wget google.com

If you get an output like this, the proxy is configured correctly on the terminal:

--2019-09-24 07:46:05--  http://google.com/
Resolving proxy.local (proxy.local)...

2.    Open a Jupyter notebook on your Amazon SageMaker notebook instance and then run the following command to list the environment variables:

%env

Example output:

{...'GIT_PAGER': 'cat',
 'MPLBACKEND': 'module://ipykernel.pylab.backend_inline',
 'HTTP_PROXY': 'http://proxy.local:3128',
 'HTTPS_PROXY': 'http://proxy.local:3128'
 'NO_PROXY': 's3.amazonaws.com,127.0.0.1,localhost'}

Note: The specific values, such as the proxy URLs, will be unique to your configuration.

3.    Make a request to confirm that the notebook instance is using the proxy settings:

import requests
requests.get("http://google.com")

Did this article help you?

Anything we could improve?


Need more help?