What is a Vector Database?

Create an AWS Account

Page topics

What is a Vector Database?
Why are vector databases important?
How are vector databases used?
Who uses vector databases?
What are the benefits of vector databases?
What is the AWS recommended vector database for Amazon Bedrock?
How can AWS support your other vector database requirements?

What is a Vector Database?

Information comes in many forms. Some information is unstructured—like text documents, rich media, and audio—and some is structured—like application logs, tables, and graphs. Innovations in artificial intelligence and machine learning (AI/ML) have allowed us to create a type of ML model—embedding models. Embeddings encode all types of data into vectors that capture the meaning and context of an asset. This allows us to find similar assets by searching for neighboring data points. Vector search methods allow unique experiences like taking a photograph with your smartphone and searching for similar images.

Vector databases provide the ability to store and retrieve vectors as high-dimensional points. They add additional capabilities for efficient and fast lookup of nearest-neighbors in the N-dimensional space. They are typically powered by k-nearest neighbor (k-NN) indexes and built with algorithms like the Hierarchical Navigable Small World (HNSW) and Inverted File Index (IVF) algorithms. Vector databases provide additional capabilities like data management, fault tolerance, authentication and access control, and a query engine.

Why are vector databases important?

Your developers can index vectors generated by embeddings into a vector database. This allows allowing them to find similar assets by querying for neighboring vectors.

Vector databases provide a method to operationalize embedding models. Application development is more productive with database capabilities like resource management, security controls, scalability, fault tolerance, and efficient information retrieval through sophisticated query languages.

Vector databases ultimately empower developers to create unique application experiences. For example, your users could snap photographs on their smartphones to search for similar images.

Developers can use other types of machine learning models to automate metadata extraction from content like images and scanned documents. They can index metadata alongside vectors to enable hybrid search on both keywords and vectors. They can also fuse semantic understanding into relevancy ranking to improve search results.

Innovations in generative artificial intelligence (AI) have introduce new types of models like ChatGPT that can generate text and manage complex conversations with humans. Some can operate on multiple modalities; for instance, some models allow users to describe a landscape and generate an image that fits the description.

Generative models are, however, prone to hallucinations, which could, for instance, cause a chatbot to mislead users. Vector databases can complement generative AI models. They can provide an external knowledge base for generative AI chatbots and help ensure they provide trustworthy information.

How are vector databases used?

Vector databases are typically used to power vector search use cases like visual, semantic, and multimodal search. More recently, they’re paired with generative artificial intelligence (AI) text models to create intelligent agents that provide conversational search experiences.

The development process starts with building an embedding model that’s designed to encode a corpus like product images into vectors. The data import process is also called data hydration. The application developer can now use the database to search for similar products by encoding a product image and using the vector to query for similar images.

Within the model, the k-nearest neighbor (k-NN) indexes provide efficient retrieval of vectors and apply a distance function like cosine to rank results by similarity.

Who uses vector databases?

Vector databases are for developers who want to create vector search powered experiences. An application developer can use open-source models, automated machine learning (ML) tools, and foundational model services to generate embeddings and hydrate a vector database. This requires minimal ML expertise.

A team of data scientists and engineers can build expertly tuned embeddings and operationalize them through a vector database. This can help them deliver artificial intelligence (AI) solution faster.

Operations teams benefit from managing solutions as familiar database workloads. They can use existing tools and playbooks.

What are the benefits of vector databases?

Vector databases allow developers to innovate and create unique experiences powered by vector search. They can accelerate artificial intelligence (AI) application development and simplify the operationalization of AI-powered application workloads.

Vector databases provide an alternative to building on top of bare k-nearest neighbor (k-NN) indexes. That kind of index requires a great deal of additional expertise and engineering to use, tune and operationalize.

A good vector database provides applications with a foundation through features like data management, fault tolerance, critical security features, and a query engine. These capabilities allow users to operationalize their workloads to simplify scaling, maintain high scalability, and support security requirements.

Capabilities like the query engine and SDKs simplify application development. They also allow developers to perform more advanced queries (like searching and filtering) on metadata as part of a k-NN search. They also have the option to use hybrid relevancy scoring models that blend traditional term frequency models like BM25 with vector scores to enhance information retrieval.

What is the AWS recommended vector database for Amazon Bedrock?

Amazon O penSearch Service is the recommended vector database for Amazon Bedrock. OpenSearch Service offers a scalable and high-performance vector database enabling vector-driven search capabilities (hybrid, semantic, multi-modal, conversational, and others), recommendation systems, chatbots, and other modern generative AI applications. OpenSearch Service is the easiest, fastest way to get started if you need vector database capabilities with Amazon Bedrock.

How can AWS support your other vector database requirements?

Amazon Web Services (AWS) offers many additional services for your vector database requirements, especially if you need vector search for existing data in other places (Amazon Aurora, Amazon S3, Amazon MemoryDB):

Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL support the pgvector extension to store embeddings from machine learning (ML) models in your database and to perform efficient similarity searches.
Amazon Neptune ML is a new capability of Neptune that uses Graph Neural Networks (GNNs), an ML technique purpose-built for graphs, to make easy, fast, and more accurate predictions using graph data.
Vector search for Amazon MemoryDB supports storing millions of vectors, with single-digit millisecond query and update response times, and tens of thousands queries per second (QPS) at greater than 99% recall.
Amazon DocumentDB (with MongoDB compatibility) supports vector search, a new capability that enables you to store, index, and search millions of vectors with millisecond response times. With vector search for Amazon DocumentDB, you can simply set up, operate, and scale databases for your ML applications.

Get started with vector databases on AWS by creating an account today.

Next Steps with AWS

Check out additional product-related resources

View free offers for Databases services in the cloud

Sign up for a free account

Instant get access to the AWS Free Tier.

Start building in the console

Get started building in the AWS management console.

What is a Vector Database?

Page topics

What is a Vector Database?

Why are vector databases important?

How are vector databases used?

Who uses vector databases?

What are the benefits of vector databases?

What is the AWS recommended vector database for Amazon Bedrock?

How can AWS support your other vector database requirements?

Next Steps with AWS

Check out additional product-related resources

Sign up for a free account

Start building in the console

Learn

Resources

Developers

Help