Posted On: Dec 8, 2020
Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Amazon SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface.
For most ML models, you can spend weeks or months aggregating and preparing data from different sources: converting, transforming, and validating raw data into features that can be used to train models and make predictions. You need to write code to author data transformations so you can transform data into formats that can be efficiently used for a model, and write additional code that can run at scale across a wide number of data sources–time far better spent on higher-value tasks.
Using Amazon SageMaker Data Wrangler’s data selection tool, you can choose the data you want from various data sources, including Amazon S3, Amazon Athena, Amazon Redshift, AWS Lake Formation, and Amazon SageMaker Feature Store, and import it with a single click. Amazon SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. With Amazon SageMaker Data Wrangler’s visualization templates, you can quickly preview and inspect that these transformations are completed as you intended by viewing them in Amazon SageMaker Studio, the first fully integrated development environment (IDE) for ML. Once your data is prepared, you can build fully automated ML workflows with Amazon SageMaker Pipelines and save them for reuse in the Amazon SageMaker Feature Store.