Orchestrate and parameterize EMR Notebook Executions without graphical user interface access

Posted on: Aug 31, 2020

EMR Notebooks is a service that provides a fully managed, Jupyter-based notebook to data scientists and engineers who write ad-hoc jobs and experiment with them. Now you can orchestrate EMR Notebooks in a non-interactive manner to run ETL workloads especially in production. Before this feature, executing notebooks required the Jupyter User Interface access through the AWS Management Console.  

The EMR notebooks APIs enable AWS CLI and SDK access to notebooks so you can run ETL workloads using notebooks in an automated fashion. You can leverage orchestration services such as AWS Step functions and Apache Airflow to build resilient workflows, and execute notebooks on schedule in a non-interactive manner using cron scripts. You can also pass input parameters to notebooks and debug all executions of a notebook by accessing the historical outputs of each execution. Before this feature, you must create a new copy of the notebook and modify it, for every new combination of the input values. 

To get started with EMR notebooks, please visit EMR Notebooks Page.  

This feature is available on EMR release version 5.18.0 or later, in the regions where EMR Notebooks is available.