Posted On: Mar 30, 2021

When creating datasets in AWS Glue DataBrew from the Amazon S3 data lake, you can now create dynamic datasets to schedule data preparation on new incoming Amazon S3 files or apply transformations on filtered or conditionally chosen files or folders in S3. You can create a dynamic S3 path to choose files based on a time-window or time of last file update, and defining custom parameters to replace string, number, or date-based values in your S3 file path with filter conditions such as begins with, ends with, contains, does not contain, less than, greater than, before, and others. Custom parameter names can be included as columns in your datasets and the revised schema will be used for jobs running on dynamic datasets. With parameterized S3 paths and/or files, users can schedule to apply existing recipes to run on selected dynamic datasets.

DataBrew is a visual data preparation tool that makes it easy to clean and normalize data using 250+ pre-built transformations for data preparation, without the need to write any code.  

To learn more, view this getting started video or use a sample dataset to explore DataBrew. To get started, visit the AWS Management Console or install the DataBrew plugin in your Notebook environment and refer to the DataBrew documentation.