AWS Glue now provides FindMatches ML transform to deduplicate and find matching records in your dataset

Posted on: Aug 9, 2019

You can now use AWS Glue to find matching records across a dataset (including ones without identifiers) by using the new FindMatches ML Transform, a custom machine learning transformation that helps you identify matching records. By adding the FindMatches transformation to your Glue ETL jobs, you can find related products, places, suppliers, customers, and more.

You can also use the FindMatches transformation for deduplication, such as to identify customers who have signed up more than once, products that have accidentally been added to your product catalog more than once, and so forth. You can teach the FindMatches ML Transform your definition of a “duplicate” through examples, and it will use machine learning to identify other potential duplicates in your dataset.

AWS Glue ML Transforms will be initially available in the US East (Northern Virginia), US East (Ohio), US West (Oregon), EU (Ireland), and Asia Pacific (Tokyo) AWS regions.

To learn more about this feature, please visit our documentation.