Guidance for Sentiment Analysis on AWS
Overview
This Guidance demonstrates how to use pgvector and Amazon Aurora PostgreSQL for sentiment analysis, a powerful natural language processing (NLP) task. The Guidance shows how to integrate Amazon Aurora PostgreSQL-Compatible Edition with the Amazon Comprehend Sentiment Analysis API, enabling sentiment analysis inferences through SQL commands. By using Amazon Aurora PostgreSQL with the pgvector extension as your vector store, you can accelerate vector similarity search for Retrieval Augmented Generation (RAG), delivering queries up to 20 times faster with pgvector's Hierarchical Navigable Small World (HNSW) indexing.
Important: This Guidance requires the use of AWS Cloud9 which is no longer available to new customers. Existing customers of AWS Cloud9 can continue using and deploying this Guidance as normal.
How it works
This architecture diagram shows how to generate sentiment analysis using Amazon Aurora PostgreSQL-Compatible Edition with pgvector enabled as the vector store. It details the process of integrating Amazon Aurora with an Amazon Comprehend Sentiment Analysis API and generating sentiment analysis inferences using SQL commands.
Get Started
Deploy this Guidance
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
The provided CloudFormation script automates the deployment of key resources, including an Aurora PostgreSQL cluster, a SageMaker notebook instance, an AWS Cloud9 instance, virtual private cloud (VPC), subnets, security groups, and AWS Identity and Access Management (IAM) roles. This automated deployment streamlines operations, reduces manual effort, and mitigates configuration errors, promoting operational excellence.
Security
An IAM role integrates Aurora with Amazon Comprehend, granting the minimum required permissions. This role is associated with the Aurora cluster and does not have credentials such as passwords or access keys, enhancing security. Database user credentials are securely stored in AWS Secrets Manager, preventing unauthorized access and potential security breaches.
IAM roles and policies provide controlled access to Amazon Comprehend's sentiment analysis API from Aurora, limiting permissions to only what's necessary. This principle of least privilege approach to access management strengthens the Guidance’s security posture.
Reliability
Aurora with pgvector enables storing and searching machine learning (ML)-generated embeddings while leveraging PostgreSQL features like indexing and querying. Aurora provides high availability and reliability by maintaining six copies of data across three Availability Zones, with read replicas and global database replication options.
Use Aurora with pgvector as the vector store offers vector capabilities combined with data reliability and durability, eliminating the need to move data across separate vector stores. Aurora's resiliency features and pgvector's capabilities allow you to use an existing relational database as a vector store, seamlessly integrating with artificial intelligence (AI) and ML services like Amazon Comprehend and SageMaker.
Performance Efficiency
Aurora PostgreSQL with pgvector offers optimized storage, compute resources, and vector indexing capabilities within the relational database, helping ensure efficient workload performance. Aurora Optimized Reads can boost vector search performance with pgvector by up to nine times for workloads, exceeding regular instance memory. Aurora with pgvector not only provides vector search, indexing, and sentiment analysis capabilities but also features for optimal query performance, combining the benefits of a relational database with vector capabilities.
Cost Optimization
SageMaker offers Savings Plans, reducing costs by up to 64 percent, in addition to flexible on-demand pricing for Studio notebooks, notebook instances, and inference. Using the AWS Cloud9 IDE instead of dedicated Amazon Elastic Compute Cloud (Amazon EC2) instances further decreases costs. Additionally, Amazon Comprehend API's pay-per-use model optimizes expenses. These services provide cost-effective options through on-demand and Savings Plans to help you align with your budget.
Sustainability
Aurora clusters on AWS Graviton instances consume up to 60 percent less energy than comparable EC2 instances while delivering the same performance and better price performance. This Guidance uses temporary resources like AWS Cloud9 and SageMaker notebooks to reduce carbon footprint. AWS Cloud9, a temporary IDE, integrates Aurora with Amazon Comprehend and generates inferences through SQL statements, further minimizing the environmental impact.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages