Posted On: Nov 16, 2021

The FindMatches ML transform in AWS Glue now allows you to match newly arrived data against existing matched datasets. The FindMatches transform allows you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. It makes it faster and easier to clean and deduplicate data sets.

AWS Glue FindMatches automates the process of identifying partially matching records for use cases including linking customer records, deduplicating product catalogs, and fraud detection. Use incremental matching in FindMatches to match new data against existing data without combining the datasets and mixing matched and unmatched data.

This feature is available in the same AWS Regions as AWS Glue.

To learn more, visit our documentation and read the blog post.