The next generation of Amazon OpenSearch Serverless: Built from the ground up for agents

Audience note: This is the deep-dive technical launch post. For a shorter overview of what changed and why, see the related post on the AWS News Blog.

Today, we are announcing a ground-up re-architecture of Amazon OpenSearch Serverless that delivers up to 20 times faster autoscaling, scale to zero, and up to 60% lower cost than provisioning clusters for peak load. Amazon OpenSearch Service is a fully managed, open source retrieval engine that unifies vector, lexical, hybrid, and agentic search, delivering low-latency, accurate and relevant results. Amazon OpenSearch Serverless is an automatically scaled deployment option.

Modern workloads are increasingly dynamic and unpredictable. An ecommerce platform sees a 10x traffic spike during a flash sale. An artificial intelligence (AI) agent triggers hundreds of concurrent vector queries while reasoning through a multi-step task, then goes idle. A multi-tenant SaaS application serves dozens of tenants with wildly different activity patterns. These workloads need infrastructure that scales up to meet demand and releases resources when demand drops.

That is why we rebuilt the Amazon OpenSearch Serverless architecture from the ground up. The new architecture decouples compute from storage. The service provisions infrastructure in seconds instead of minutes, and scales compute all the way to zero when your application is idle. In this post, we walk through the new architecture, what it means for your applications, and how to get started with a hands-on tutorial.

With this launch, Amazon OpenSearch Serverless introduces two named architectures. Existing collections are now referred to as Classic collections. The new architecture is called NextGen and is now the default when you create a new collection via the AWS Console. You can use NextGen architecture in the API by specifying --generation NEXTGEN in the CLI. To continue using the Classic architecture, specify --generation CLASSIC in the CLI or omit the optional --generation parameter.

What this means for your applications

The new architecture delivers improvements across three pillars: performance, cost, and a simplified user experience.

Performance: Autoscaling in seconds

An OpenSearch Compute Unit (OCU) is the unit of compute capacity that powers your indexing and search workloads. Amazon OpenSearch Serverless now provisions additional OCUs in seconds. When traffic arrives, the service adds resources in line with demand instead of reacting after a worker is already under pressure. The same mechanism scales the infrastructure back down quickly when traffic drops. The new architecture scales capacity up to 20 times faster than the previous architecture, so your users experience consistent performance during traffic surges, and you stop paying for capacity when you no longer need it.

Cost efficiency: Pay only for what you use

Indexing, search, storage, and Vector Index GPU-Acceleration are metered and billed independently, so you can see and optimize each dimension of your workload separately.

Decoupled compute and storage: OpenSearch Serverless now has full decoupling between compute and storage, allowing OCUs to scale up and down irrespective of the amount of data stored in a collection. This is powered by a new storage layer that is accessible to both indexing and search OCUs. You can now have multiple indices with data indexed in them but not pay any compute costs if you are not actively indexing or searching data. For workloads with significant idle time, the new architecture can reduce infrastructure costs by up to 60% compared to the cost of provisioning OpenSearch Service domains for peak capacity.

Scale to zero: When no requests arrive within the idle timeout window (10 minutes), the service releases compute resources and your OCU usage scales to 0. When traffic resumes, capacity is back in approximately 10 seconds. During this window, the service queues incoming requests and serves them once capacity is available; it does not drop them. If you anticipate a burst of traffic, for example before a scheduled batch job or a marketing campaign, you can send a lightweight query (such as a match_all with size=1) to warm the collection before your application starts sending production traffic. This reduces the latency your users experience on the first real request. Indexing and search scale independently. If you have no search requests, search OCUs scale to zero, even while OpenSearch Serverless maintains indexing OCUs for indexing requests, and vice versa.

GPU acceleration for vector workloads: For vector collections created in the new architecture, OpenSearch Serverless automatically uses GPU-backed compute to accelerate Hierarchical Navigable Small World (HNSW) vector index construction, significantly reducing indexing time compared to CPU-only builds. GPU acceleration kicks in automatically whenever there is an opportunity to leverage GPUs to reduce overall indexing time and cost. In the Classic architecture, you had to opt in or out of GPU acceleration at the collection level through the API. If you want to disable GPU acceleration for NextGen collections for a specific index, you can turn off the remote index build setting at the index level. GPU usage appears as a separate line item on your bill, so you have full visibility into when acceleration was active and what it cost. For more details on how GPU acceleration works and performance benchmarks, refer to Build billion-scale vector databases in under an hour with GPU acceleration on Amazon OpenSearch Service.

Simplified experience: Fewer steps to production

We also simplified the day-to-day experience of running OpenSearch Serverless:

With the new architecture, you can provision a collection and start sending requests in seconds. There is no need for capacity planning, no sizing decisions, and no waiting for infrastructure to warm up. This makes Amazon OpenSearch Serverless a natural fit for agentic workloads, where an AI agent can spin up a vector search or retrieval step on demand and expect a response without delay.

To make getting started even faster, we have introduced Express Create on the console. You supply a collection name and a collection type, choose Express Create, and your collection is active in seconds with no upfront network, encryption, or access policies to configure. You can add those later if your workload requires them.

Collection groups and collections can also be created programmatically using the AWS Command Line Interface (AWS CLI) and AWS SDKs. AWS CloudFormation support is coming soon.

The new architecture introduces two endpoint formats on the on.aws domain. The per-collection endpoint (<collectionId>.aoss.<region>.on.aws) works the same way as before with one endpoint per collection. The per-account Regional endpoint (<accountId>.aoss.<region>.on.aws) is new: it serves all of your collections through a single hostname, with the target collection identified in each request using the x-amz-aoss-collection-name or x-amz-aoss-collection-id header. This means one connection pool, one Transport Layer Security (TLS) session, and one endpoint to manage regardless of how many collections you have — a significant improvement for multi-tenant workloads where each tenant maps to its own collection. Both endpoints use standard AWS PrivateLink, so you create virtual private cloud (VPC) endpoints from the VPC console or the EC2 API just like any other AWS service. Private Domain Name System (DNS) is configured automatically, eliminating the Amazon Route 53 Private Hosted Zones, forwarding rules, and custom DNS infrastructure that were required with the original architecture. Cross-VPC, cross-account, and on-premises access all work using standard vpce-* DNS names with no additional setup.

Collection groups are the new unit of organization for your collections. You can share compute capacity across multiple collections with Collection Groups, which reduces cost for smaller collections that have complementary traffic patterns. You can also assign different AWS Key Management Service (AWS KMS) keys to collections within the same group, so you get both cost efficiency and per-collection encryption isolation. Collection groups are required when creating collections with the new architecture.

You also get the benefits of OpenSearch open-source releases without needing to manage versions and upgrades. The service tracks upstream releases automatically.

Amazon OpenSearch Serverless is also available on the Vercel Marketplace, making it straightforward for developers to add search infrastructure directly from their Vercel projects. You can link an existing AWS account through delegated access, or get started through a Limited Scope Account with USD $100 in AWS credit if you are new to AWS.

The integration creates a collection with sensible defaults, scale-to-zero billing, public endpoints, and AWS-managed encryption, and automatically sets connection details as environment variables in your Vercel project. You can choose from Search or Vector Search collection types depending on your use case, whether that is full-text search or semantic and AI-powered search.

How the architecture works

The new Amazon OpenSearch Serverless architecture separates compute from storage entirely. OCUs are stateless and read from and write to a distributed shared storage layer that is accessible to both indexing and search OCUs. The storage layer is designed for high durability, keeping your data available independently of the compute nodes that process it.

Architecture diagram showing OpenSearch Serverless NextGen with stateless indexing and search OCUs reading from and writing to a shared distributed storage layer

This design has two practical consequences:

Fast provisioning. New OCUs start serving requests in seconds because there is no local disk to bootstrap. The OCU mounts the shared storage layer and begins processing immediately.
Efficient scale down. Idle capacity can be released with no impact to your stored data, because the data never lived on the OCU. When traffic subsides, compute resources are released and your cost drops accordingly.

Architecture comparison

The following table summarizes the key differences between the original and new architectures:

Capability	Classic Architecture	NextGen Architecture
Minimum capacity	2 OCUs (always on)	0 OCUs (scale to zero)
Scaling speed	Minutes	Seconds
Storage	Local storage per compute node	Distributed shared storage (decoupled)
Collection organization	Individual collections (Default) Collection groups (Optional)	Collection groups (required)
Cold start from zero	N/A (always on)	~10 seconds
Endpoint	Per-collection endpoint	Regional endpoint (static per account)
Cost vs. OpenSearch Service domain	Baseline	Up to 60% lower cost
Scaling speed (vs. Classic)	Baseline	Up to 20 times faster than baseline

Walkthrough: Create a vector collection and observe scale to zero

In this walkthrough, you create a vector search collection with Express Create, index a few sample documents with embeddings, run a k-nearest neighbor (k-NN) query, and watch the collection scale to zero in Amazon CloudWatch. The entire process takes about 10 minutes.

Prerequisites

An AWS account with permissions to create Amazon OpenSearch Serverless collections.
AWS Command Line Interface (AWS CLI) configured with appropriate credentials.
curl 7.75 or later (for built-in --aws-sigv4 support).

Step 1: Configure security policies

Create encryption, network, and data access policies. These must exist before the collection can be created.

# Create an encryption policy
aws opensearchserverless create-security-policy \
    --name product-vectors-encryption \
    --type encryption \
    --policy '{"Rules":[{"ResourceType":"collection","Resource":["collection/product-vectors"]}],"AWSOwnedKey":true}' \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

# Create a network policy (public access for this tutorial)
aws opensearchserverless create-security-policy \
    --name product-vectors-network \
    --type network \
    --policy '[{"Rules":[{"ResourceType":"collection","Resource":["collection/product-vectors"]},{"ResourceType":"dashboard","Resource":["collection/product-vectors"]}],"AllowFromPublic":true}]' \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

# Get your principal ARN
PRINCIPAL_ARN=$(aws sts get-caller-identity --query 'Arn' --output text)

# Create a data access policy
aws opensearchserverless create-access-policy \
    --name product-vectors-data \
    --type data \
    --policy "[{\"Rules\":[{\"ResourceType\":\"index\",\"Resource\":[\"index/product-vectors/*\"],\"Permission\":[\"aoss:CreateIndex\",\"aoss:DescribeIndex\",\"aoss:UpdateIndex\",\"aoss:DeleteIndex\",\"aoss:ReadDocument\",\"aoss:WriteDocument\"]}],\"Principal\":[\"\${PRINCIPAL_ARN}\"]}]" \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

Note: If you use the AWS console’s Express Create workflow, these policies are created automatically.

Important: After creating the data access policy, wait approximately 30 to 60 seconds for the policy to propagate before making API calls to the collection. If you receive a 403 Forbidden error, wait and retry.

Step 2: Create a collection group and collection

Create a collection group with scale-to-zero capacity limits, then create a vector search collection within it.

# Create a collection group with scale-to-zero enabled (min OCU = 0)
aws opensearchserverless create-collection-group \
    --name product-search-cg \
    --generation NEXTGEN \
    --standby-replicas ENABLED \
    --capacity-limits "minIndexingCapacityInOCU=0,maxIndexingCapacityInOCU=4,minSearchCapacityInOCU=0,maxSearchCapacityInOCU=4" \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

# Create a vector search collection in the group
aws opensearchserverless create-collection \
    --name product-vectors \
    --type VECTORSEARCH \
    --collection-group-name product-search-cg \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

The collection status transitions to ACTIVE within seconds.

Step 3: Create a vector index

Retrieve the collection endpoint and create a k-NN index using 3-dimensional vectors:

ENDPOINT=$(aws opensearchserverless batch-get-collection \
    --names product-vectors \
    --query 'collectionDetails[0].collectionEndpoint' \
    --output text \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2")

awscurl --service aoss --region us-east-2 \
    -XPUT "${ENDPOINT}/items" \
    -H "Content-Type: application/json" \
    -d '{
      "settings": {"index.knn": true},
      "mappings": {
        "properties": {
          "description": {"type": "text"},
          "embedding": {"type": "knn_vector", "dimension": 3,
            "method": {"name": "hnsw", "space_type": "cosinesimil", "engine": "faiss"}}
        }
      }
    }'

Note: If the collection has scaled to zero, the first request might take a few seconds while capacity scales up. If the request times out, wait 10 to 15 seconds and retry.

Step 4: Index sample documents with embeddings

awscurl --service aoss --region us-east-2 \
    -XPOST "${ENDPOINT}/items/_bulk" \
    -H "Content-Type: application/json" \
    -d '
{ "index": { "_id": "1" } }
{ "description": "Wireless noise-cancelling headphones", "embedding": [0.8, 0.2, 0.1] }
{ "index": { "_id": "2" } }
{ "description": "Portable Bluetooth speaker", "embedding": [0.7, 0.3, 0.2] }
{ "index": { "_id": "3" } }
{ "description": "Over-ear studio monitor headphones", "embedding": [0.9, 0.1, 0.05] }
'

Step 5: Run a k-NN query

Search for the two nearest neighbors to a query vector. Wait 30 seconds after indexing to allow the vector index to build before running this query:

awscurl --service aoss --region us-east-2 \
    -XGET "${ENDPOINT}/items/_search" \
    -H "Content-Type: application/json" \
    -d '{
      "query": {
        "knn": {
          "embedding": {
            "vector": [0.85, 0.15, 0.08],
            "k": 2
          }
        }
      }
    }'

The response returns the two most similar items, in this case, the headphone documents whose embeddings are closest to your query vector.

You can also run this query in OpenSearch UI by navigating to your collection in the Amazon OpenSearch Service console and choosing the OpenSearch UI Application URL. Then follow the steps outlined in this blog to create a workspace. Then navigate to Dev Tools and paste and run the following query.

GET items/_search
{
  "query": {
    "knn": {
      "embedding": {
        "vector": [0.85, 0.15, 0.08],
        "k": 2
      }
    }
  }
}

Step 6: Observe scale to zero

After a period of inactivity (no indexing or search traffic), the collection group scales down to 0 OCU. Verify with:

aws opensearchserverless batch-get-collection-group \
    --names product-search-cg \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

In the response, currentCapacity.search.capacityInOcu and currentCapacity.indexing.capacityInOcu will show 0 after the collection has scaled down.

You can also navigate to the Collection groups page in the Amazon OpenSearch Service console. Choose your collection group, then scroll down to the Monitoring section. Here you can see two charts: Indexing capacity (OCUs) and Search capacity (OCUs). After 10 minutes of idle time (no indexing or search requests), both metrics drop to zero, confirming that the service has released all compute resources for your collection.

CloudWatch monitoring charts in the Amazon OpenSearch Service console showing indexing and search capacity dropping to zero OCUs after 10 minutes of idle time

Clean up

To avoid ongoing charges, delete the resources you created in this walkthrough when you are done. Delete the collection first so the collection group becomes empty, then delete the group, then remove the security and access policies.

# Look up the collection ID, then delete the collection
COLLECTION_ID=$(aws opensearchserverless batch-get-collection \
    --names product-vectors \
    --query 'collectionDetails[0].id' \
    --output text \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2")

aws opensearchserverless delete-collection \
    --id "${COLLECTION_ID}" \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

# Look up the collection group ID, then delete the collection group
GROUP_ID=$(aws opensearchserverless batch-get-collection-group \
    --names product-search-cg \
    --query 'collectionGroupDetails[0].id' \
    --output text \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2")

aws opensearchserverless delete-collection-group \
    --id "${GROUP_ID}" \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

# Delete the security and access policies
aws opensearchserverless delete-security-policy \
    --name product-vectors-encryption \
    --type encryption \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

aws opensearchserverless delete-security-policy \
    --name product-vectors-network \
    --type network \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

aws opensearchserverless delete-access-policy \
    --name product-vectors-data \
    --type data \
    --endpoint-url "https://aoss.us-east-2.amazonaws.com" \
    --region "us-east-2"

Upgrading existing collections

To move to the new architecture, create a new collection group and collection, then reindex your data into it. For a step-by-step walkthrough of the reindexing process, refer to Perform reindexing in Amazon OpenSearch Serverless using Amazon OpenSearch Ingestion. Your queries and index mappings remain the same. Only the collection endpoint changes. With the new static Regional endpoint, that is a one-time update.

The new architecture supports SEARCH and VECTORSEARCH collection types. TIMESERIES is not supported at launch.

Conclusion

The new Amazon OpenSearch Serverless architecture is available today. You can create your first OpenSearch Serverless collection in seconds with Express Create, scale it to handle production traffic, and your OpenSearch Serverless compute costs drop to zero when it sits idle.

To learn more:

If you have questions or feedback, open a support case or reach out through your AWS account team. We look forward to seeing what you build.

AWS Big Data Blog