Skip to main content

Guidance for E-Commerce Products Similarity Search on AWS

Overview

This Guidance shows how to create a product catalog with a similarity search capability by integrating AWS and artificial intelligence (AI) services with the pgvector extension. As an open-source extension for PostgreSQL, pgvector adds the ability for you to store and search for points in a vector embedding and find the most similar or "nearest neighbor" to those points. The nearest neighbor search capabilities allow you to use the semantic meaning to power a variety of intelligent applications and data analysis within your PostgreSQL database. By integrating pgvector with AWS services, as shown here, you can conduct both image and text-to-image similarity searches to provide a more personalized, relevant, and efficient shopping experience for your consumers.

Important: This Guidance requires the use of AWS Cloud9 which is no longer available to new customers. Existing customers of AWS Cloud9 can continue using and deploying this Guidance as normal.

How it works

This architecture diagram shows how to build a product catalog with a similarity search capability. It uses artificial intelligence (AI), Amazon SageMaker, Amazon RDS for PostgreSQL, and the pgvector extension.

Get Started

Sample code

Use sample code to deploy this Guidance in your AWS account
Learn more

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

SageMaker simplifies machine learning model lifecycle management, allowing you to quickly adapt to changing data and user demands. RDS for PostgreSQL with the pgvector extension offers robust data storage and efficient nearest neighbor search capabilities, so you can deliver accurate and timely search results to your consumers. Together, these services streamline the deployment, monitoring, and maintenance of your search experience.

Read the Operational Excellence whitepaper 

RDS for PostgreSQL safeguards your data with industry-standard encryption protocols, while SageMaker offers built-in security controls to manage model training and deployment processes securely.

We recommend you use AWS Identity and Access Management (IAM) to control access to your AWS resources, and use AWS Secrets Manager to protect sensitive credentials.

Read the Security whitepaper 

RDS for PostgreSQL provides high availability and durability, with automatic backups, database snapshots, and multiple Availability Zone (AZ) deployments for enhanced fault tolerance. Also, SageMaker allows you to configure multiple instances across AZs for high availability and quick recovery from failures for your machine learning operations.

Read the Reliability whitepaper 

SageMaker supports near real-time inference and low-latency responses to user queries. RDS for PostgreSQL with the pgvector extension enables efficient management and querying of vector embeddings, significantly speeding up the similarity searches needed to match user queries with your product catalog.

We recommend you continuously monitor and optimize your system's performance by using AWS services like Amazon CloudWatch and AWS Auto Scaling so that the components in this Guidance remain responsive and cost-effective.

Read the Performance Efficiency whitepaper 

SageMaker helps reduce costs by providing a managed service with pay-as-you-go pricing and instance types optimized for specific workloads. Additionally, RDS for PostgreSQL offers cost efficiency through reserved instances and scaling options that adjust resources based on your database workload, minimizing unnecessary expenses. Moreover, you can implement cost monitoring and optimization strategies, such as AWS Budgets and AWS Cost Explorer, to continuously identify and address potential cost inefficiencies.

Read the Cost Optimization whitepaper 

SageMaker and RDS for PostgreSQL are managed AWS services that optimize resource usage through efficient handling of workloads, reducing the environmental impact by minimizing the computational resources required for your workloads. And by deploying this Guidance in the AWS Cloud, you can avoid the need for physical hardware procurement, further enhancing the overall sustainability of your system. Additionally, use AWS services like AWS CloudTrail and AWS Config to monitor and enforce sustainable practices, such as resource utilization and energy efficiency.

Read the Sustainability whitepaper 

Disclaimer