Prepare data for machine learning faster and easier on Amazon SageMaker Data Wrangler with support for more data sources and distributed jobs

Posted on: May 7, 2021

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. Starting today, you can use new capabilities of Amazon SageMaker Data Wrangler that make it easier and faster to prepare data for machine learning including: cross-account access for Amazon S3, support for up to 1000 columns of data, distributed jobs, and a new SageMaker Data Wrangler notebook experience.

With the launch of Amazon S3 cross-account access, you can import data from any S3 bucket you have access to and easily browse data inside your S3 buckets regardless of which account they’re in. Once you’ve navigated to the S3 bucket, you can interactively browse the S3 bucket’s contents and import them into Amazon SageMaker Data Wrangler with a single click. Additionally, many machine learning applications require preparing data sets with hundreds of columns. With the launch of support for 1000 column data sets, you can easily prepare data for machine learning applications. With distributed jobs you can now scale out your data processing workloads on multiple instances to process data of almost any size. Today, you can specify an instance count of more than 1 for the ml.m5.4xlarge, ml.m5.12xlarge and the ml.m5.24xlarge instance types to easily scale out your data processing workloads. Finally, SageMaker Data Wrangler’s new notebook experience makes Jobs notebooks easier to use. The notebooks have been reorganized for easy configuration and provide documentation so you can get started faster.

To get started with new capabilities of Amazon SageMaker Data Wrangler, you can open Amazon SageMaker Studio and click File > New > Flow from the menu or “new data flow” from the SageMaker Studio launcher. To learn more, visit the feature page or view documentation.

Prepare data for machine learning faster and easier on Amazon SageMaker Data Wrangler with support for more data sources and distributed jobs

Learn

Resources

Developers

Help