Posted On: Nov 17, 2021

The FindMatches ML transform in AWS Glue now includes an option to output match scores, which indicate how closely each grouping of records match each other. The FindMatches transform allows you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. FindMatches helps automate complex data cleaning and deduplication tasks.

AWS Glue FindMatches automates the process of identifying partially matching records for use cases including linking customer records, deduplicating product catalogs, and fraud detection. Use match scoring in FindMatches to understand your FindMatches models, decide if they are trained to your satisfaction, and to decide which records to merge.

This feature is available in the same AWS Regions as AWS Glue.

To learn more, visit our documentation and read the FindMatches blog post.