AWS for Industries

Amazon Titan Embeddings for enhanced content recommendations to power 1:1 personalization

Customers are more connected than ever before and expect brands to interact with them in highly personalized ways irrespective of device or channel of engagement. Artificial intelligence (AI)-powered recommendation engines have proven to be an effective solution to deliver personalized experiences to consumers at scale. The machine learning (ML) model used by recommendation engines learns from past consumer interactions and recommends items of interest to be presented to the consumer at each interaction point. A key component of modern recommender systems is to analyze metadata stored within vast catalogs of product or content typically consisting of textual features such as title, description, reviews, brand, and tags. Encoding these textual features as numerical vectors, also known as embeddings, in a latent space allows recommender systems to scale efficiently while encoding semantic understanding of the underlying textual features. Choosing the right embedding tool can simplify complex data pipelines and feature engineering. In this article, we outline key benefits of Amazon Titan Embeddings provided by Amazon Bedrock and how it fits into Blueshift recommender systems.

Amazon Titan Embeddings: Powering next generation recommendations

Embeddings have proven to be a versatile alternative to manual feature engineering and can generalize to multiple media formats. Amazon Titan Embeddings, part of Amazon Bedrock transforms textual attributes into a numerical representation in an n-dimensional space.

In the vast landscape of product recommendation methodologies – from collaborative and content-based recommender systems to demographic, utility, knowledge-based, and hybrid systems – the integration of large language Model (LLM) embeddings offers a novel opportunity to incorporate deeper semantic knowledge of the textual content and similarity in latent language space.

How Amazon Titan enhances content-based recommender systems

Content-based recommender systems are a specialized subset of recommendation methodologies that curate suggestions for users by deeply analyzing a product’s inherent attributes. These systems emphasize understanding the content of the products and linking it to users’ preferences. As an example, consider a news platform: if articles are the products, then the system would identify articles based on their titles, topics, categories, keywords, location, or authorship. If a user consistently reads articles on “sustainable energy,” the system is designed to offer further articles diving into that same topic or closely related subjects.

Each piece of content or product in a catalog usually contains multiple textual attributes such as title, brand, description, category, tags, and more. These textual attributes can be transformed into a compact numerical representation in an n-dimensional space using Amazon Titan LLM embeddings. This vector representation encapsulates the essence of the product in the underlying semantic space, and by using this, we compute similar products, which are proximate in this n-dimensional space.

Three-dimensional visualization of product embeddings showing how products of similar category are clustered together in the latent space

[Fig 1] Three-dimensional visualization of product embeddings showing how products of similar category are clustered together in the latent space

Diving Deep: The tech stack for using LLM-based embeddings

In this section we will review the main components of the architecture and dive deep into them.

Harnessing Bedrock’s Titan Model Embedding API

The Amazon Bedrock’s Titan LLM model embedding API is a simple yet powerful abstraction to obtain fixed length embeddings for any input text. The Amazon Bedrock API supports text retrieval, semantic similarity, and clustering for this model. The maximum input text is 8,000 tokens and the maximum output vector length is 1,536.

Deeplake: The backbone of Our vector storage

Our choice for vector storage is Deeplake. Well known among open source alternatives for vector stores, Deeplake provides seamless integration with Amazon Simple Storage Service (S3). Deeplake also offers the capability to store and filter on metadata alongside embeddings. This ensures that our product embeddings are enriched with contextual metadata, allowing for focused retrieval depending on the context. The entire Deeplake vector store is hosted securely on private S3 buckets with encryption keys to help ensure security, compliance, and reliability.

Continuous embedding updates

In the Blueshift platform, product catalogs are ever evolving. To ensure our recommendations remain relevant and up to date, embeddings are continuously updated or added as soon as new products enter the client’s catalog or existing products undergo changes.

Continuous retrieval of product embeddings through the Amazon Bedrock

[Fig 2] Continuous retrieval of product embeddings through the Amazon Bedrock embedding API and storing in a Deeplake vector store

Computing k-nearest neighbors for embeddings

To derive context relevant recommendations, it’s essential to find products that are proximally similar. We use Deeplake’s built-in filtering depending on the client context. After this initial filtering, we use FAISS, a library specifically designed for efficient similarity search and clustering of dense vectors, to identify the nearest neighbors among the embeddings.

Content-based recommendations by finding k-nearest neighbors of a product embedding using FAISS

[Fig 3] Content-based recommendations by finding k-nearest neighbors of a product embedding using FAISS

Delivering tailored recommendations

After identifying recommendation candidates from various algorithms, including those from Amazon Titan Embeddings, they are initially pooled together. These diverse candidates are then streamlined using our re-ranking algorithm, an integral component of our learning to rank system. This ensures that the final set of recommendations resonates with the user’s interests and recent actions. These tailored suggestions are then delivered across channels such as emails, push notifications, or directly integrated into client websites. This multi-layered approach ensures a broader selection while maintaining high levels of personalization, leading to improved user engagement.

Here are some metrics from our internal quality testing. These metrics were computed on an open-sourced product catalog of about 10,000 books. Embeddings for these books were fetched from Amazon Bedrock and OpenAI. Using these embeddings, the top five semantic nearest neighbors were computed for each book. Average ROUGE (Recall-oriented understudy for gisting evaluation-is a set of metrics and a software package used for evaluating automatic summarization and machine translation software in natural language processing) values were computed for the top 5 nearest neighbors for all the products.

Table - Recommendation candidates from different algorithmsRecommendation candidates from different algorithms

[Fig 4] Recommendation candidates from different algorithms are re-ranked by the re-ranking algorithm based on user profile and activity and can be served using email, SMS, web, and so on.

Security, compliance and data governance

One of the key benefits of Amazon Bedrock is built-in data governance and security practices and controls. Your data is never used to train underlying foundational models or shared with third parties. By using Amazon Virtual Private Cloud (Amazon VPC) and AWS Key Management Service (AWS KMS) all data transfers remain secure in transit and rest. With ability to audit and trace all training, validation and inference steps Amazon Bedrock provides ready-to-use enterprise-grade APIs and tools.

Use cases: Amazon Titan Embeddings in action

Amazon Titan Embeddings have the potential to improve content-based recommendations across several use cases, and many of Blueshift’s clients stand to benefit from this foundational generative AI service.

News media platforms

In the bustling world of news media, where content is produced at an unprecedented rate, Amazon Titan Embeddings can be a game-changer. By analyzing the context and content of articles, media platforms can seamlessly present readers with articles that align with their current reading preferences, enhancing user engagement and positioning the platform as a primary source for tailored information.

E-commerce

E-commerce platforms that have large catalogs can overwhelm users with endless digital aisles. By using Amazon Titan Embeddings and analyzing product attributes such as names, brands, and descriptions, platforms can build multiple flavors of content recommendations. This enables platforms to provide users with recommendations of similar items, streamlining the shopping experience and ensuring users find the most relevant products as they browse.

Online learning platforms

The world of online education is vast, and learners can feel lost amidst the plethora of courses. Amazon Titan Embeddings can transform this experience. By analyzing courses that users have shown interest in, online platforms can offer recommendations for similar classes, ensuring learners consistently discover content that resonates with their preferences and educational goals.

These potential use cases highlight the versatility and promise of Amazon Titan Embeddings across diverse domains, from news media and e-commerce to online education, underscoring the transformative potential of advanced recommendation systems.

Conclusion

In the rapidly evolving landscape of content recommendations, the significance of LLM embeddings cannot be overstated. These advanced embeddings have proven to be superior to older models such as BERT and ELECTRA, offering more nuanced and contextually relevant recommendations without complex feature engineering. Blueshift is leading the industry in AI marketing and is committed to bringing the best and latest AI advances to the platform. The partnership between Blueshift and Amazon Bedrock is representative of this commitment. As generative AI accelerates at an unprecedented pace, new and more powerful LLM models continue to emerge. With the robust infrastructure and collaboration offered by Amazon Bedrock, Blueshift is poised to seamlessly integrate these cutting-edge models, ensuring that our recommendation solutions remain state-of-the-art.

Introducing Blueshift: The future of AI marketing
Blueshift is an industry leader in marketing automation and customer data platforms used by leading digital brands to power 1:1 personalization and customer engagement across all communication channels and devices. Blueshift offers out-of-the box AI models and marketer friendly studios with built-in recipes to automate cross-channel journeys with hyper-personalized content and product recommendations.

Praachee Gokhale

Praachee Gokhale

Praachee Gokhale is a Solutions Architect at Amazon Web Services(AWS). She has a M.S. in Electrical Engineering from the San Jose State University. She is based in the San Francisco Bay Area and is passionate about containers and generative AI.

Anmol Singh Suag

Anmol Singh Suag

Anmol Singh Suag is the Lead Data Scientist at Blueshift. He has built several industry grade recommender systems and has deep expertise in collaborative filtering, content-based filtering, and NLP/LLM-based recommendations. His academic background includes a M.S. in Computer Science from the University of Massachusetts Amherst and a Bachelor of Engineering in Computer Science from BITS, Pilani.

Manyam Mallela

Manyam Mallela

Manyam Mallela is the co-founder and Chief AI Officer at Blueshift and is a computer science graduate of UT Austin and IIT Bombay with research focus on AI and ML systems. His research work has been published in many leading AI conferences like ICML, SIGKDD and JMLR and cited thousands of times in other scholarly publications.