Posted On: Sep 21, 2021
Amazon SageMaker announces a new set of capabilities that will enable interactive Spark based data processing from SageMaker Studio Notebooks. Amazon SageMaker Studio is the first fully integrated development environment (IDE) for machine learning (ML). SageMaker Studio provides a single, web-based visual interface where you can perform all ML development steps required to prepare data, as well as build, train, and deploy models. With a single click, data scientists and developers can quickly spin up Studio Notebooks to interactively explore datasets and build ML models.
Starting today, data scientists and data engineers can visually browse, discover, and connect to Spark data processing environments running on Amazon EMR, right from their Studio notebooks in a few simple clicks. Once connected, they can interactively query, explore and visualize data, and run Spark jobs using the built-in SparkMagic notebook environments for Python and Scala.
Analyzing, transforming and preparing large amounts of data is a foundational step of any data science and ML workflow and businesses are leveraging Apache Spark for fast data preparation. SageMaker Studio already offers purpose-built and best-in-class tooling such as Experiments, Clarify and Model Monitor for ML. With the newly launched capability, customers can easily access purpose-built Spark environments from Studio Notebooks. SageMaker Studio can therefore now serve as a unified environment for data science and data engineering workflows enabling customers to standardize data workflows onto Studio notebooks.
These new data analytics capabilities in SageMaker Studio are generally available in all AWS Regions where SageMaker Studio is available and there are no additional charges to use this capability. For complete information on pricing and regional availability, please refer to the SageMaker Studio pricing page. To learn more, see “Interactive Data Preparation with Studio Notebooks” in the SageMaker Studio Notebooks user guide.