Amazon EMR Notebooks

Write and debug Apache Spark applications in real time

Why EMR Notebooks?

Amazon EMR Notebooks, a managed environment based on Jupyter and Jupyter-lab notebooks, enables users to interactively analyze and visualize data, collaborate with peers, and build applications using EMR clusters. EMR Notebooks is designed for Apache Spark. It supports Spark Magic kernels, which allows you to remotely run queries and code on your EMR cluster using languages like PySpark, Spark SQL, Spark R, and Scala.

With EMR Notebooks, there is no software or instances to manage. You can either attach the notebook to an existing cluster or provision a new cluster directly from the console. You can attach multiple notebooks to a single cluster, detach notebooks and re-attach them to new clusters.

EMR Notebooks allows you to:

  1. Monitor and debug Spark jobs directly from your notebook. 
  2. Install notebook-scoped libraries on a running EMR cluster 
  3. Associate Git repositories with your notebook for version control, and simplified code collaboration and reuse
  4. Compare and merge two notebooks using the nbdime utility

There is no additional cost for using EMR Notebooks. You only pay for the EMR cluster attached to the notebook. It’s easy to create multiple notebooks directly from the EMR console. Follow this step-by-step tutorial to get started.