Posted On: Mar 30, 2022
AWS Glue version 2.0 now supports the AWS Glue FindMatches machine learning transform. AWS Glue FindMatches automates the process of identifying partially matching records for use cases including linking customer records, deduplicating product catalogs, and fraud detection. Using Glue 2.0, ETL jobs that perform fuzzy matching using FindMatches start in under a minute and have 1-minute minimum billing.
Use the FindMatches transform to identify and then merge or deduplicate related records in your datasets. For example, it can recognize that records are matches despite spelling and formatting differences like “John Doe” vs. “Jhn Doe”, “JOHN_DOE@ANYCOMPANY.COM” vs. “email@example.com”, or “555-010-0000” vs. “+1-555-010-0000”.
This feature is available in the same AWS Regions as AWS Glue.