AWS Entity Resolution Documentation

Flexible and customizable data preparation

AWS Entity Resolution reads your data from AWS Glue to use it as data inputs for match processing. You can specify a maximum of 20 data inputs. Each row of the data input table is processed as a record, with a unique identifier serving as a primary key. AWS Entity Resolution can operate on encrypted datasets. You need to also define the schema mapping for AWS Entity Resolution to understand which input fields you want to use in your matching workflow. You can bring your own data schema, or blueprint, from an existing AWS Glue data input or build your custom schema using an interactive user interface or JSON editor. By default, data inputs are also normalized prior to matching to improve match processing such as removing special characters and extra spaces and formatting text to lowercase. You can turn off normalization if your data input has already been normalized. We also provide a GitHub library, which you can use to further customize the data normalization process to suit your needs.

Configurable entity matching workflows

An entity matching workflow is a sequence of steps you set up to tell AWS Entity Resolution how to match your data input and where to write the consolidated data output. You can set up one or more matching workflows to compare different data inputs and use various matching techniques, such as rule-based matching and ML matching. You can also view the job status of existing matching workflows and metrics, such as resource number, number of records processed, and number of matches found.

Data protection and regionalization by design

AWS Entity Resolution offers a default encryption capability that will provide you with an encryption key for every data input into the service. Both AWS Entity Resolution and its data encryption capability support regionalization to where the data is processed, and they operate in the same AWS Region from where you are using data for matching. For example, AWS Entity Resolution gives you the flexibility to bring server-side previously encrypted and hashed data to run rule-based matching workflows. Finally, you can also encrypt and hash the data output in Amazon Simple Storage Service (Amazon S3) before using your resolved data in other applications. 

Ready-to-use rule-based matching

This matching technique includes a set of ready-to-use rules in the AWS Management Console or command line interface (CLI) to find matches, based on your input fields. You can also customize the rules (such as adding or removing input fields for each rule), delete rules, rearrange the priority of rules, and create new rules. You can also reset the rules to return them to their original configurations. The data output in your S3 bucket will have match groups. They are generated by AWS Entity Resolution using the rule-based matching where each match group has the rule number used to generate that match associated to it, helping you understand the fidelity of the match. For example, the rule number can demonstrate the precision of each match group so that the first rule is more precise than the second rule and so on.

Preconfigured ML matching

This matching technique includes a preconfigured ML model to find matches across all of your data inputs, especially consumer-based records. The model uses all input fields associated with name, email address, phone number, address, and date of birth data types. The model generates match groups of related records with a confidence score in each group that explains the quality of the match relative to other match groups. The model takes into consideration missing input fields and analyzes the record together to represent an entity and cannot be customized. The data output in your S3 bucket will have match groups. These are generated by AWS Entity Resolution using the ML-based matching where each match group has a confidence score associated to it between 0.0 to 1.0 explaining the precision of the match.

Manual bulk processing and automatic incremental processing

Data processing helps you convert your data inputs into a consolidated data output table with similar records that have a common match ID generated using entity matching workflow configurations. Using the API and the AWS Management Console or CLI, you can then run manual bulk processing on demand, based on your existing extract, transform, and load (ETL) data pipeline. This reprocesses data for any new matches and updates to existing matches. Also, for rule-based matching scenarios, you can initiate incremental processing. As soon as new data is available in your S3 bucket, the service reads those new records and compares them against existing records.

Lookup

Looking up entity match IDs through the AWS Entity Resolution Get Match ID API helps you retrieve an existing match ID. You can call AWS Entity Resolution with personally identifiable information (PII) attributes acquired through multiple sources and channels. AWS Entity Resolution hashes those attributes and retrieves the corresponding entity match ID to link and match the customer. For example, you can get a web sign-up with an associated name, email, and address. You use the AWS Entity Resolution API to find out if this customer or entity already exists in your matched results stored in your S3 bucket, along with the corresponding entity match ID associated to it. Once you get the entity match ID, you can find the transactional information associated to it in your source applications, such as your customer relationship management (CRM) or customer data platform (CDP) systems.

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.