Amazon Sagemaker

Amazon SageMaker is a fully-managed platform that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. With Amazon SageMaker, all the barriers and complexity that typically slow down developers who want to use machine learning are removed. The service includes models that can be used together or independently to build, train, and deploy your machine learning models.

Identity Resolution with AI and ML

By:

harpin AI

Latest Version:

1.1.6

This product performs identity resolution on customer data from various sources to create accurate and complete customer profiles.

Continue to Subscribe

Product Overview

This product allows users to perform identity resolution on their own customer data from various data sources (e.g. bookings, transactions and loyalty program). The algorithm will link those different data sources to create an accurate and complete view of their customer profiles without moving any customer data outside of their aws account.

Key Data

Version

1.1.6

Usage Instructions

Release Notes

harpin AI

Highlights

This product allows users to perform identity resolution on their customer data inside their own aws account.
It provides field level mapping, normalization, standardization and repair out of the box. It utilizes the recent advancement of AI.

Not quite sure what you’re looking for? AWS Marketplace can help you find the right solution for your use case. Contact us

Pricing Information

Use this tool to estimate the software and infrastructure costs based your configuration choices. Your usage and costs might be different from this estimate. They will be reflected on your monthly AWS billing reports.

Estimating your costs

Choose your region and launch option to see the pricing details. Then, modify the estimated price by choosing different instance types.

Version

Region

Software Pricing

Algorithm Training$0.00/hr

running on ml.m5.2xlarge

Model Realtime Inference$0.00/hr

running on ml.m5.2xlarge

Model Batch Transform$0.00/hr

running on ml.m5.2xlarge

Infrastructure Pricing
With Amazon SageMaker, you pay only for what you use. Training and inference is billed by the second, with no minimum fees and no upfront commitments. Pricing within Amazon SageMaker is broken down by on-demand ML instances, ML storage, and fees for data processing in notebooks and inference instances.
Learn more about SageMaker pricing

SageMaker Algorithm Training$0.461/host/hr

running on ml.m5.2xlarge

SageMaker Realtime Inference$0.461/host/hr

running on ml.m5.2xlarge

SageMaker Batch Transform$0.461/host/hr

running on ml.m5.2xlarge

Algorithm Training

For algorithm training in Amazon SageMaker, the software is priced based on hourly pricing that can vary by instance type. Additional infrastructure cost, taxes or fees may apply.

	InstanceType	Algorithm/hr
	ml.m5.4xlarge	$0.00
	ml.m5.2xlarge Vendor Recommended	$0.00
	ml.m5.xlarge	$0.00

Model Realtime Inference

For model deployment as Real-time endpoint in Amazon SageMaker, the software is priced based on hourly pricing that can vary by instance type. Additional infrastructure cost, taxes or fees may apply.

	InstanceType	Realtime Inference/hr
	ml.m5.4xlarge	$0.00
	ml.m5.2xlarge Vendor Recommended	$0.00
	ml.m5.xlarge	$0.00

Model Batch Transform

For model deployment as Batch transform job in Amazon SageMaker, the software is priced based on hourly pricing that can vary by instance type. Additional infrastructure cost, taxes or fees may apply.

	InstanceType	Batch Transform/hr
	ml.m5.4xlarge	$0.00
	ml.m5.2xlarge Vendor Recommended	$0.00
	ml.m5.xlarge	$0.00

Usage Information

Training

The training actually performs a clustering process in this particular case. It clusters the records in the input datasets into a set of dis-joint customer profiles. Each of the input datasets should be provided as a folder (i.e. S3 folder) containing one or more CSV or avro or parquet files. Note that only one file type (i.e. file extension) is allowed in the folder or a single data source.

Channel specification

Fields marked with * are required

clustering

The clustering input file

Input modes: File

Content types: csv, avro, parquet

Compression types: None

clustering2

The 2nd clustering input file

Input modes: File

Content types: csv, avro, parquet

Compression types: None

clustering3

The 3rd clustering input file

Input modes: File

Content types: csv, avro, parquet

Compression types: None

channel_config

The data source configuration file for all channels

Input modes: File

Content types: yaml

Compression types: None

Model input and output details

Input

Summary

Either CSV or avro or parquet type is allowed for the clustering process. If CSV input files are used, each CSV file should be comma-delimited (,) and contain a header line at the top. Each row of a CSV file represents a single record, while each column represents a field. The following are the recommended fields: sourceRecordId, firstName, middleName, lastName, dateOfBirth, emailAddress, mobilePhone, homePhone, workPhone, postalCode, streetAddress, city, governingDistrict, ipAddress, accountId.

Limitations for input type

CSV, avro, or parquet

Input MIME type

text/csv

Sample input data

record_id,given_name,sur_name,dob,email,phone,zip,street_address
101,John,Smith,19901010,john@gmail.com,5051234567,92128,123 main street
202,Joe,Matthew,20001010,joe@gmail.com,8581234567,92101,456 ace street

Expand all input descriptions

Input data description

Field names

firstName, middleName, lastName

Description

firstName: given name; middleName: middle name; lastName: surname

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field name

dateOfBirth

Description

dateOfBirth: date of birth

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field name

emailAddress

Description

emailAddress: email address

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field names

mobilePhone, homePhone, workPhone

Description

mobilePhone: mobile phone; homePhone: home phone; workPhone: work phone

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field names

postalCode, streetAddress, city

Description

postalCode: postal code; streetAddress: street address; city: city

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field name

ipAddress

Description

ipAddress: ip address

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field name

accountId

Description

accountId: customer account identifier

Required

Data type

FreeText

Default value

BLANK

Limitations

None of the above fields are required, but there is a minimum information required in order to be able to uniquely identify each record.

Field name

sourceRecordId

Description

This is a required field, which uniquely identifies each of the records in the data source.

Required

Yes

Data type

FreeText

Limitations

Maximum 64 characters

Output

Summary

The training (i.e. clustering in this case) process produces an identity graph which will be stored in a folder with one or more files with the exact same type as the input files. If the input files are CSVs, then the output will contains CSV files too. All the fields in the input files will be retained in the output files, along with one additional field called PIN. The field PIN is the assigned unique customer profile identitfier.

Output MIME type

text/csv

Sample output data

view data

Expand all output descriptions

Output data description

Field names

sourceRecordId, firstName, middleName, lastName, dateOfBirth, emailAddress, mobilePhone, homePhone, workPhone, postalCode, streetAddress, city, ipAddress, accountId

Description

The output data - identity graph - will contain the union of fields from all the input data sources, plus an additional field "pin".

Always returned

Yes

Data type

FreeText

Field name

Description

The output data includes all the input fields plus one additional field PIN. PIN is the assigned unique customer profile identitfier. The default value for PIN is -1. Customer records with the same (non-default) PIN are considered to be referring to the same customer profile.

Always returned

Data type

Integer

Range

Default value

The default value for PIN is -1, meaning that there is not enough information available in the input record to determine which customer profile it belongs to.

Sample notebook

Sample notebook link

Git repository link

Additional Resources

Harpin AI revolutionizes customer data quality through AI/ML.

End User License Agreement

By subscribing to this product you agree to terms and conditions outlined in the product End user License Agreement (EULA)

Support Information

AWS Infrastructure

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Learn More

Refund Policy

This product is offered for free. If there are any questions, please contact us for further clarifications.

Customer Reviews

There are currently no reviews for this product.

View all

Write a review

Share your thoughts about this product.

Write a customer review

Select your cookie preferences

Customize cookie preferences

Essential

Performance

Functional

Advertising

Unable to save cookie preferences

Amazon Sagemaker

Identity Resolution with AI and ML

Product Overview

Key Data

Highlights

Pricing Information

Estimating your costs

Version

Region

Software Pricing

Algorithm Training

Model Realtime Inference

Model Batch Transform

Annual contract

Model Realtime Inference

Usage Information

Training

Channel specification

clustering

clustering2

clustering3

channel_config

Model input and output details

Input

Output

Sample notebook

Additional Resources

End User License Agreement

Support Information

AWS Infrastructure

Refund Policy

Customer Reviews

Write a review