Posted On: Apr 20, 2021
Today we are excited to announce the general availability of EMR Studio, an integrated development environment (IDE) that makes it easy for data scientists and data engineers to develop, visualize, and debug big data and analytics applications written in R, Python, Scala, and PySpark. EMR Studio provides fully managed Jupyter Notebooks, and tools like Spark UI and YARN Timeline Service to simplify debugging. EMR Studio uses AWS Single Sign-On and allows you to log in directly with your corporate credentials without logging into the AWS console.
With EMR Studio, as an administrator, you can either create and configure EMR Studios from the EMR console, or automate the Studio creation by specifying the configurations and dependencies in a CloudFormation template. You can use the AWS SSO console to enable AWS SSO, choose from supported identity providers including Okta, Azure AD, OneLogin, Ping Identity, and Microsoft AD, and use the EMR console to assign users and groups to EMR Studio.
EMR Studio provides notebook examples such as PySpark code querying Hive metastore, Python code for visualization, etc to help you quickly start developing your data science applications. You can connect notebooks to GitHub, Bitbucket, GitLab, and AWS CodeCommit repositories regardless of public access points. You can run your applications on existing EMR clusters or create new clusters using a pre-defined CloudFormation template and passing custom parameters in EMR Studio. You can launch the live Spark UI directly from notebooks to access logs and debug the application.
EMR Studio is generally available on EMR release version 5.32 and 6.2 and later, in the US East (Ohio, N. Virginia), US West (Oregon), Europe (Ireland, Frankfurt, and London), and Asia Pacific (Mumbai, Seoul, Singapore, Sydney, and Tokyo) Regions.
To get started with EMR Studio, see our Amazon EMR Studio documentation.