This Guidance demonstrates how to use pgvector and Amazon Aurora PostgreSQL for sentiment analysis, a powerful natural language processing (NLP) task. The Guidance shows how to integrate Amazon Aurora PostgreSQL-Compatible Edition with the Amazon Comprehend Sentiment Analysis API, enabling sentiment analysis inferences through SQL commands. By using Amazon Aurora PostgreSQL with the pgvector extension as your vector store, you can accelerate vector similarity search for Retrieval Augmented Generation (RAG), delivering queries up to 20 times faster with pgvector's Hierarchical Navigable Small World (HNSW) indexing.

Please note: [Disclaimer]

Architecture Diagram

[Architecture diagram description]

Download the architecture diagram PDF 

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

  • The provided CloudFormation script automates the deployment of key resources, including an Aurora PostgreSQL cluster, a SageMaker notebook instance, an AWS Cloud9 instance, virtual private cloud (VPC), subnets, security groups, and AWS Identity and Access Management (IAM) roles. This automated deployment streamlines operations, reduces manual effort, and mitigates configuration errors, promoting operational excellence.

    Read the Operational Excellence whitepaper 
  • An IAM role integrates Aurora with Amazon Comprehend, granting the minimum required permissions. This role is associated with the Aurora cluster and does not have credentials such as passwords or access keys, enhancing security. Database user credentials are securely stored in AWS Secrets Manager, preventing unauthorized access and potential security breaches.

    IAM roles and policies provide controlled access to Amazon Comprehend's sentiment analysis API from Aurora, limiting permissions to only what's necessary. This principle of least privilege approach to access management strengthens the Guidance’s security posture.

    Read the Security whitepaper 
  • Aurora with pgvector enables storing and searching machine learning (ML)-generated embeddings while leveraging PostgreSQL features like indexing and querying. Aurora provides high availability and reliability by maintaining six copies of data across three Availability Zones, with read replicas and global database replication options.

    Use Aurora with pgvector as the vector store offers vector capabilities combined with data reliability and durability, eliminating the need to move data across separate vector stores. Aurora's resiliency features and pgvector's capabilities allow you to use an existing relational database as a vector store, seamlessly integrating with artificial intelligence (AI) and ML services like Amazon Comprehend and SageMaker.

    Read the Reliability whitepaper 
  • Aurora PostgreSQL with pgvector offers optimized storage, compute resources, and vector indexing capabilities within the relational database, helping ensure efficient workload performance. Aurora Optimized Reads can boost vector search performance with pgvector by up to nine times for workloads, exceeding regular instance memory. Aurora with pgvector not only provides vector search, indexing, and sentiment analysis capabilities but also features for optimal query performance, combining the benefits of a relational database with vector capabilities.

    Read the Performance Efficiency whitepaper 
  • SageMaker offers Savings Plans, reducing costs by up to 64 percent, in addition to flexible on-demand pricing for Studio notebooks, notebook instances, and inference. Using the AWS Cloud9 IDE instead of dedicated Amazon Elastic Compute Cloud (Amazon EC2) instances further decreases costs. Additionally, Amazon Comprehend API's pay-per-use model optimizes expenses. These services provide cost-effective options through on-demand and Savings Plans to help you align with your budget.

    Read the Cost Optimization whitepaper 
  • Aurora clusters on AWS Graviton instances consume up to 60 percent less energy than comparable EC2 instances while delivering the same performance and better price performance. This Guidance uses temporary resources like AWS Cloud9 and SageMaker notebooks to reduce carbon footprint. AWS Cloud9, a temporary IDE, integrates Aurora with Amazon Comprehend and generates inferences through SQL statements, further minimizing the environmental impact.

    Read the Sustainability whitepaper 
Blog

Leverage pgvector and Amazon Aurora PostgreSQL for Natural Language Processing, Chatbots and Sentiment Analysis

This blog post demonstrates how to build an interactive chatbot app for question answering using LangChain and Streamlit and leveraged pgvector and its native integration with Aurora Machine Learning for sentiment analysis.

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?