Amazon EMR Studio makes it easier for data scientists to build and deploy code

Posted on: Dec 9, 2020

Today we are announcing the public preview of EMR Studio, an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug data engineering and data science applications written in R, Python, Scala, and PySpark. EMR Studio provides fully managed Jupyter Notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging.

EMR Studio uses AWS SSO and allows you to log in directly with your corporate credentials without logging into the AWS console. Data scientists and analysts can install custom kernels and libraries, collaborate with peers using code repositories such as GitHub and BitBucket, or run parameterized notebooks as part of scheduled workflows using orchestration services like Apache Airflow or Amazon Managed Workflows for Apache Airflow.

EMR Studio kernels and applications run on EMR clusters, so you get the benefit of distributed data processing using the performance optimized Amazon EMR runtime for Apache Spark. Administrators can setup EMR Studio such that analysts can run their applications on existing EMR clusters or create new clusters using pre-defined AWS CloudFormation templates for EMR. In EMR Studio, you can browse all EMR clusters in a central place and narrow down using filters by cluster ID, cluster state, and other parameters. With a single click, you can access the Spark History Server, YARN Timeline Server, or Tez UI that overlay execution context on jobs on both active and terminated clusters. 

EMR Studio is available on EMR release version 6.2 and later, in the US East (N. Virginia), US West (Oregon), and EU (Ireland) Regions for public preview.

To get started with EMR Studio public preview, see Product Detail Page