Reimagining Vector Databases for the Generative AI Era with Pinecone Serverless on AWS

By Shashi Raina, Sr. Solutions Architect – AWS Startups
By Anubhav Sharma, Principal Solutions Architect – AWS SaaS Factory
ByBear Douglas, Director of Developer Relations – Pinecone

Pinecone

Artificial intelligence (AI) is transforming how we build applications today. With large language models (LLMs) like Claude from Anthropic and Titan from Amazon, generative AI is becoming more powerful and accessible.

However, there are still limitations around an AI system’s access to knowledge and ability to reason. Pinecone serverless is a novel vector database architecture optimized for AI workloads. Built on Amazon Web Services (AWS), it enables applications to efficiently store, update, and query vector embeddings at scale.

In this post, we will look at the benefits Pinecone serverless provides within an AI use case. We’ll also look at the underlying AWS services Pinecone serverless leverages that allows it to offer a fully managed, scalable, and cost-effective vector database offering.

Pinecone is an AWS Partner and AWS Marketplace Seller that offers vector database for AI and machine learning (ML), making real-time deployment simple, fast, and reliable.

Pinecone actively engaged with the AWS SaaS Factory Program, collaborating closely to accelerate the development of their vector database SaaS solution on AWS.

“Our partnership with AWS has been incredibly important to Pinecone in a number of ways, ensuring our solution can scale to meet the demand of our largest customers, and at the same time collaborating on the go-to-market side as we jointly go after large opportunities.

“Building Pinecone serverless was a huge undertaking, and the AWS SaaS Factory team was able to pull in experts from across AWS, with the technical knowledge and understanding of AWS infrastructure to help us design our system in the right way. SaaS Factory was able to look at our architecture and provide us with advice, ensuring that we’re taking full advantage of all the AWS resources that are out there.” – Elan Dekel, VP, Product at Pinecone

Limitations of Traditional Vector Databases

Working backwards from customer use cases, Pinecone has identified three critical demands for vector databases:

Freshness: Ability to keep indexes updated in near real-time as new data comes in.
Elasticity: Easy horizontal scaling of the vector database to meet the load.
Cost efficiency: Optimized architecture to keep costs low, especially for infrequent queries.

Traditional vector database architectures often struggle to meet these customer needs, using a scatter-gather architecture where data is shared across nodes. This works well for high-throughput querying, but keeping the full index in memory on all shards is expensive for intermittent queries on large datasets. Updating indexes dynamically is challenging in these cases.

There is also a lack of separation between storage and compute, making elasticity and cost optimization difficult. As more enterprise use cases emerge–like retrieval augmented generation (RAG), multi-tenant search, and data labeling–it’s becoming clear that high-throughput querying of static datasets is not actually the most common type of workload.

This is where Pinecone serverless architecture on AWS shines.

Pinecone’s Unique Architecture

Pinecone serverless introduces a novel architecture that decouples storage from compute. It can efficiently page portions of indexes on-demand from object storage into ephemeral compute resources. This enables cost-effective intermittent querying on huge datasets.

Pinecone’s solution to multi-tenancy within an index is achieved through the use of namespaces. These are designed to map to specific tenants, providing a means of data isolation between tenants. As a result, one tenant is unable to access the data of another tenant. This approach is fundamental to Pinecone’s serverless architecture.

Optimized for the Cloud with AWS

Under the hood, Pinecone serverless leverages AWS services like Amazon Elastic Kubernetes Services (Amazon EKS), Amazon Aurora, Amazon Simple Storage Service (Amazon S3), and AWS Key Management Service (AWS KMS) to deliver a cloud-native architecture.

Amazon S3 provides durable object storage for vector indexes, and on-demand queries run and access just the parts of the index needed from S3. This is different from the traditional vector databases that keep the full index in memory on the shards, which makes it expensive to run queries on large datasets.

Pinecone serverless encrypts the data inside the indexes leveraging AWS KMS, and Amazon EKS provides scalable, low latency, and high-performance compute that is decoupled from the storage.

This architecture provides the separation of storage and compute needed for elasticity, and reduces costs by orders of magnitude compared to always-on resources for intermittent workloads. Customers can start small and scale seamlessly while only paying for what they use. Infrequently accessed namespaces don’t drive up costs, and indexes stay fresh through asynchronous background processes.

Benefit for Customers

From an end-user perspective, the database auto-scales seamlessly to handle spikes in traffic without any infrastructure provisioning.

Customers get a serverless experience that offers several advantages including cost savings, scalability, zero infrastructure management, enhanced security, operational agility, and improved operational efficiency.

Key benefits of the Pinecone serverless architecture include:

Elasticity: Architecture simplifies horizontal scaling, as users can elastically scale to meet demands without capacity planning.
Cost savings: By adopting Pinecone serverless, customers like Gong have seen 10x or more cost reductions compared to traditional vector database architectures.
Operational efficiency: Pinecone serverless eliminates the need to provision, deploy, update, monitor, or otherwise manage servers, as these tasks are handled by Pinecone.

Pinecone serverless makes vector search economical at scale:

Retrieval-augmented generation: Pinecone serverless allows smaller AI models to match larger models’ performance by retrieving relevant knowledge. This makes AI more accessible.
Labeling: Active learning systems like Gong Smart Trackers use Pinecone’s cost-effective searches to identify relevant examples for human labeling.
Multi-tenant search: Customers leverage Pinecone’s namespaces for isolation and scale across users.

By rethinking vector databases for an AI-first world, Pinecone serverless removes major bottlenecks to building more knowledgeable AI applications on AWS. Pinecone serverless points to the future of vector search and knowledge management for AI.

Using a vector database like Pinecone serverless for RAG is the most effective way to improve response quality and reduce hallucinations. It’s more effective than fine-tuning, and gets better as you add more data.

These modern AI workloads demand new architecture to support their scale in a performant and cost-effective way. That’s why Pinecone recently redesigned their vector database on top of AWS to meet those needs.

Bringing it All Together

By reinventing vector databases for the cloud, Pinecone serverless makes state-of-the-art vector search accessible to a much broader range of applications. Companies like Gong and Frontier Medicines are using Pinecone serverless on AWS.

Older vector database architectures simply were not able to meet their demands at their scale; Pinecone serverless has been a game-changing product that has enabled them to deliver new AI enhancements while reducing costs–in Gong’s case by 10x.

Pinecone serverless leads the way in reimagining vector search to provide knowledge for the AI-driven applications of tomorrow.

The Future of Knowledge in AI

As AI continues its rapid evolution, expect to see vector databases as a critical component in providing the knowledge to make AI truly intelligent.

Pinecone, in collaboration with AWS, is leading the way into this exciting future.

About AWS SaaS Factory

AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Explore the SaaS on AWS hub >>

Learn more about AWS SaaS Factory >>
Become an AWS SaaS Competency Partner? Sign up here >>

Pinecone – AWS Partner Spotlight

Pinecone serverless is available to use today through AWS Marketplace.

Contact Pinecone | Partner Overview | AWS Marketplace