Posted On: Feb 10, 2021

The FindMatches ML transform in AWS Glue now includes information on how much each column in the dataset contributed to determining if records were matches. The FindMatches transform enables you to identify duplicate or matching records in your dataset, even when the records do not have a common unique identifier and no fields match exactly. This feature makes it easier to decide how to improve your FindMatches transforms.

Previously, you needed to use an iterative process and follow best practices guides on feature engineering to improve your FindMatches ML transforms. With column importance metrics, AWS Glue gives you direct feedback on how heavily it weighs the contents of each column when determining that sets of records match each other. You can use this information to transform your dataset to improve match quality.

The FindMatches ML transform is available in the same AWS Regions as AWS Glue.

To learn more about FindMatches, visit our documentation.