Posted On: May 1, 2023

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, exploration, cleansing, and processing from a single visual interface.

Starting today, you can use new capabilities of Amazon SageMaker Data Wrangler to prepare image data for labeling, training or inference. You can preview and import images from Amazon S3, use a variety of built-in image transforms to clean, standardize and improve quality of your image data. These built-in transforms include resize, drop duplicates, rotation, flip, greyscale, enhance contrast, blur and add noise, etc. Data Wrangler also supports advanced use cases such as detecting outliers or extract texts from images using custom code and built-in code snippets. These code snippets include examples of how to utilize a pre-trained model using Amazon Sagemaker Jumpstart to perform advanced analysis or transformations by calling a pre-deployed model endpoint. After you create a recipe on the sampled image data in the interactive mode, you can create a PySpark job via the visual interface to scale the processing on all the images in your dataset.  

Data Wrangler supports image data preparation in all the regions currently supported by Data Wrangler. To learn more, see this blog post and the AWS technical documentation.