Posted On: Nov 24, 2021

EMR Studio is an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug big data and analytics applications written in R, Python, Scala, and PySpark. Today, we are excited to announce two new capabilities in EMR Studio. First, you can now more easily execute python scripts directly from the EMR Studio Notebooks. Second, you can execute other dependent Jupyter notebooks directly from a notebook in EMR Studio. Earlier, both of these capabilities required manually copying these files from EMR Studio to the EMR Cluster. 

An EMR Studio Workspace provides a fully-managed serverless Jupyter instance in the cloud which comes with a local file system where you can author, store, and organize your notebooks and files. Data Scientists often have python scripts and Notebooks that need to be invoked from other Notebooks. For e.g. a python script doing generic data quality checks may be used across multiple notebooks. Previously, you needed to manually copy these files from EMR Studio Workspace’s local storage to the cluster in order to execute them. You can now use %mount_workspace_dir Jupyter magic command to mount your EMR Studio Workspace directory to an EMR Cluster. This allows notebooks running on EMR Clusters to execute python files or invoke other notebooks in your local Workspace without manually copying these files or logging into the cluster. In addition, we have also added a command - %generate_s3_download_url to download files from Amazon S3. You can use this capability to download a data file from a notebook to analyze it locally e.g. to further analyze it in Excel. Without this capability, you had to navigate to the Amazon S3 console to download files from your S3 bucket. Both the above Jupyter magic commands are made available in the EMR Notebooks iPython Magics package.

EMR Studio is available in US East (Ohio), US East (N. Virginia), US West (Oregon), Canada (Central), Europe (Ireland), Europe (Frankfurt), Europe (London), Europe (Paris), Europe (Stockholm), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and South America (Sao Paulo) Regions.

To learn more about this feature, see our documentation here. To learn more about using this feature, see our sample notebook here.