How do I troubleshoot SageMaker notebook instance connectivity issues?

3 minute read
1

I'm unable to launch an Amazon SageMaker notebook and I see intermittent errors.

Short description

When opening a SageMaker Jupyter notebook, the notebook might become unresponsive or display errors.

Some common causes for this are:

  • SageMaker can't establish connection between Jupyter and its browser.
  • The Notebook kernel reached its defined timeout period.
  • Resource utilization load.

Resolution

Note: If you receive errors when running AWS Command Line Interface (AWS CLI) commands, make sure that you’re using the most recent AWS CLI version.

Can't establish connection between Jupyter and the browser

Sometimes when opening a Jupyter notebook, you receive the following error:

"A connection to the notebook server could not be established. The notebook will continue trying to reconnect. Check your network connection or notebook server configuration."

This message occurs when there's a connection issue between Jupyter and the browser. To troubleshoot the connection failed message, do the following:

  • Restart your notebook instance. It's a best practice to regularly restart notebook instances. Restarting helps keep notebook instance software updated. When you restart, the notebook instance moves to a new host. Restarting the notebook also helps to resolve HTTP 503 and 504 errors in the browser.
  • Restart your browser, clear your browser cache, or try a different browser.
  • Use a different network connection.
  • Check if the firewall, proxy, or antivirus software is blocking the connection.
  • Check the log of all WebSockets in your browser. This setting is typically found in the developer mode of the browser.
  • Temporarily turn off all browser extensions, and then try again.

Notebook kernel reached its defined timeout period

The Jupyter notebook session token has a maximum validity of 12 hours. After the token expires, the session times out and must be refreshed to reset the timeout token. However, the Jupyter kernel continues to run even if the browser disconnects.

To mitigate the effects of the 12-hour token, do the following:

  • Write the results of program to a file instead of using stdout.
  • Convert your program to a Python script, and then run it.
  • Make a call to CreatePresignedNotebookInstanceURL to generate a new URL with AuthToken. Next, paste the new URL in your browser before the session expires. This generates a new 12-hour session token:
aws sagemaker create-presigned-notebook-instance-url —notebook- instance-name <instance name> 
}
"AuthorizedUrl": "https:// name>.notebook. <region>.sagemaker.aws? AuthToken=<authToken>"
{
  • Go directly to AuthorizedUrl. This is the same as choosing Open Jupyter from the SageMaker console.
  • You can modify the URL to add "view=Lab&" to the following form to open JupyterLab:
    "https:// name>.notebook.

.sagemaker.aws? view=Lab&AuthToken=

"

Reached the limit of resource utilization

Check the system resources for your SageMaker notebook instance to make sure that they’re running at acceptable load levels. To check SageMaker notebook instance resources, enter following commands in the Notebook terminal:

To check memory utilization:

free -h

To check CPU utilization:

top

To check disk utilization:

df -h

If you see high utilization of CPU, memory, or disk utilization, then try these solutions:


Related information

Troubleshoot problems with opening an Amazon SageMaker Jupyter notebook

Troubleshoot insufficient capacity error in Amazon SageMaker

AWS OFFICIAL
AWS OFFICIALUpdated a year ago