Guidance for Social Media Insights on AWS
Overview
How it works
This Guidance helps you gain insight into what your customers are saying about your products and services on social media websites such as X, Facebook, and Instagram. Instead of filtering out posts manually, you can build a near real-time alert system that consumes data from social media and extracts insights, such as topics, entities, sentiment, and location using a large language model (LLM) in Amazon Bedrock.
Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
Amazon CloudWatch keeps logs of the operations performed in the text processing workflow, allowing for efficient monitoring of the applications status. Amazon CloudFormation allows for reproducibility of the deployment and can also be rolled back to a stable state in case deployment fails. Additionally, Amazon Bedrock is a managed service to use LLMs through a simple interface. This combination of monitoring, reproducible deployments, and AWS managed LLMs usage offers powerful natural language processing capabilities without having to manage the underlying infrastructure.
Security
The data stored in Amazon S3 is encrypted at rest using AWS Key Management Service (AWS KMS) keys, and AWS Identity and Access Management (IAM) is utilized to control access to the data. Specifically, AWS KMS assists in the creation and management of the encryption keys used to securely encrypt the data stored in Amazon S3. Whereas IAM provides the capability to configure granular permissions based on roles for least privilege access control to that data.
Reliability
The data is stored in Amazon S3, an object storage service that offers 99.999999999% (11 nines) durability. The LLMs are invoked using Amazon Bedrock through a simple and efficient API interface that can automatically scale up and down. Athena, QuickSight, and AWS Glue are used to query and visualize the data at scale without the need to provision infrastructure.
Performance Efficiency
Through the use of various serverless and managed AWS services, this Guidance is designed for your workloads to achieve high performance efficiency, automatically scaling resources to meet the demands of the workload, and providing a seamless experience for you to access insights from your social media platforms. For example, Lambda, a serverless compute service, automatically scales up and down based on demand, ensuring the compute capacity is optimized for the workload. With Amazon Bedrock, you can invoke LLMs from an extensive catalogue without the need to provision and manage the underlying servers.
Cost Optimization
Lambda is used in this architecture to process events and initiate the batch transformation analysis, removing the need for a continuously running server. Moreover, AWS Glue jobs are used to perform extract, transform, load (ETL) on batches of user data, rather than individual records. By aggregating the data and processing in larger chunks, the overall compute and storage requirements are reduced, leading to lower costs compared to handling each record individually. Lastly, Amazon Bedrock allows for the use of the LLN that best fits your budget requirement so you do not incur unnecessary expenses associated with more powerful, but potentially over-provisioned, models.
Sustainability
Lambda, AWS Glue, Athena, and QuickSight are all serverless services that operate on-demand, adjusting their resource use to match the current workload. This helps ensure that the performance and use of resources are maximized, as the services scale up and down automatically to accommodate the required demand. By using these serverless offerings, this architecture can efficiently utilize the necessary resources, avoiding over-provisioning or under-utilization of compute, storage, and other infrastructure components.