Write and debug Apache Spark applications in real time

Amazon EMR Notebooks, a managed environment based on Jupyter and Jupyter-lab notebooks, enables users to interactively analyze and visualize data, collaborate with peers, and build applications using EMR clusters. EMR Notebooks is designed for Apache Spark. It supports Spark Magic kernels, which allows you to remotely run queries and code on your EMR cluster using languages like PySpark, Spark SQL, Spark R, and Scala.

With EMR Notebooks, there is no software or instances to manage. You can either attach the notebook to an existing cluster or provision a new cluster directly from the console. You can attach multiple notebooks to a single cluster, detach notebooks and re-attach them to new clusters.

EMR Notebooks allows you to:

  1. Monitor and debug Spark jobs directly from your notebook. 
  2. Install notebook-scoped libraries on a running EMR cluster 
  3. Associate Git repositories with your notebook for version control, and simplified code collaboration and reuse
  4. Compare and merge two notebooks using the nbdime utility

There is no additional cost for using EMR Notebooks. You only pay for the EMR cluster attached to the notebook. It’s easy to create multiple notebooks directly from the EMR console. Follow this step-by-step tutorial to get started.

Amazon EMR Notebooks

Resources

Blog

EMR Notebooks: A managed analytics environment based on Jupyter notebooks

Tutorial

Associate Git repositories with EMR Notebooks

Blog

Install Python libraries on a running cluster with EMR Notebooks

Read EMR migration guide
Read the migration guide

Learn how to migrate big data from on-premises to AWS.

Learn more 
Sign up for a free AWS account
Sign up for a free account

Instantly get access to the AWS Free Tier. 

Sign up 
Start building with EMR in the console
Start building in the console

Get started building with Amazon EMR in the AWS Console.

Sign in