Support for JupyterHub on Amazon EMR release 5.14.0.

Posted on: Jun 14, 2018

You can now use JupyterHub on Amazon EMR with EMR release 5.14.0. JupyterHub is a multi-user Jupyter notebook server that serves each user with their own Jupyter notebook interface. It allows multiple users to concurrently use their Jupyter notebooks, create and execute code and perform exploratory data analysis. JupyterHub on EMR is integrated with the Spark framework allowing you to perform interactive Spark queries on EMR clusters using Scala, PySpark, Spark R and Spark SQL kernels. You can also run Python jobs locally and take advantage of the many popular data-science libraries that are pre-installed in your notebook. Now, with EMR release 5.14.0, EMRFS, Amazon EMR’s connector for S3, supports auditing of users who ran queries that accessed data in S3 through EMRFS. This feature is turned on by default and will pass on user and group information to audit logs like CloudTrail, providing you with comprehensive request tracking. Besides auditing, EMRFS provides features like consistent view, S3 server-side and client-side encryption, and fine-grained authorization to S3.

You can launch JupyterHub by selecting “JupyterHub” from the list of applications to be installed when you configure and launch your cluster. Please visit the Amazon EMR documentation for more information about the EMR release 5.14.0, JupyterHub and EMRFS.

Amazon EMR release 5.14.0 is available in all supported regions for Amazon EMR.