Posted On: Oct 21, 2022
Today, we are excited to announce support for scheduling Data Wrangler processing jobs in Amazon SageMaker Data Wrangler. Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow, including data selection, cleansing, exploration, and visualization from a single visual interface. Previously, scheduling a data processing job would involve integrating with a serverless compute capability and an event bus service. This process would also involve writing code to schedule the data processing job in a production environment. Integrating these various capabilities together and writing the code to orchestrate this workflow can be a laborious, time-consuming task for data scientists, data engineers and ML engineers.
With support for scheduling in Data Wrangler, you can now schedule a Data Wrangler processing job in a few clicks. Jobs can be scheduled to run at specific times and days of the week. Schedules can also be inputted as CRON expressions for additional customization and flexibility (for instance to schedule a job that runs on the first Wednesday of a calendar month). You can attach up to two schedules to a Data Wrangler processing job. Once a desired schedule has been entered, Data Wrangler displays a preview of the next five upcoming job runs for additional confirmation. You can access this scheduling capability as part of the “Create Job” workflow in Data Wrangler.
This feature is generally available in all AWS Regions that Data Wrangler currently supports at no additional charge. To get started scheduling your data processing jobs with SageMaker Data Wrangler read the AWS documentation.