Amazon EMR Notebooks, a managed environment based on Jupyter and Jupyter-lab notebooks, enables users to interactively analyze and visualize data, collaborate with peers, and build applications using EMR clusters. EMR Notebooks is designed for Apache Spark. It supports Spark Magic kernels, which allows you to remotely run queries and code on your EMR cluster using languages like PySpark, Spark SQL, Spark R, and Scala.
With EMR Notebooks, there is no software or instances to manage. You can either attach the notebook to an existing cluster or provision a new cluster directly from the console. You can attach multiple notebooks to a single cluster, detach notebooks and re-attach them to new clusters.
EMR Notebooks allows you to:
- Monitor and debug Spark jobs directly from your notebook.
- Install notebook-scoped libraries on a running EMR cluster
- Associate Git repositories with your notebook for version control, and simplified code collaboration and reuse
- Compare and merge two notebooks using the nbdime utility
There is no additional cost for using EMR Notebooks. You only pay for the EMR cluster attached to the notebook. It’s easy to create multiple notebooks directly from the EMR console. Follow this step-by-step tutorial to get started.
EMR Notebooks: A managed analytics environment based on Jupyter notebooks
Associate Git repositories with EMR Notebooks
Install Python libraries on a running cluster with EMR Notebooks
Learn how to migrate big data from on-premises to AWS.
Instantly get access to the AWS Free Tier.
Get started building with Amazon EMR in the AWS Console.