Overview
S4 Embed is a Vector Search FinOps layer: it helps you find and run a low-cost vector-search configuration that meets your recall target. Quantization (binary for an up-to-32x smaller in-RAM ANN graph, int8 residual for about 4x smaller on-disk vectors) plus a two-stage search (coarse Hamming shortlist then exact rescore) reduce RAM while you tune the over-fetch/rescore operating point to hold recall. On a 30k-vector benchmark, binary Hamming + float rescore reached recall@10 of 0.987 at over-fetch 50 and 0.995-0.996 at over-fetch 100 on OpenSearch and pgvector, 1.000 on Qdrant, and 0.976 on Milvus (tune per workload) - all at the 32x RAM reduction. Recall scales with over-fetch as the corpus grows, so the operating point is chosen per workload. The pipeline is store-agnostic across OpenSearch, pgvector, Qdrant, and Milvus.
The FinOps tools are the product. s4embed prove estimates the recall/cost/latency frontier on your vectors using the in-process quantization/rescore model and returns a recommended config. s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus for your corpus and queries. s4embed tune emits a deployable config meeting a declared recall + latency + RAM budget. A gateway shadow mode dual-writes and shadow-compares live reads so you can watch the compressed path reproduce your primary before cutting over; use compare plus shadow mode to validate before any cutover, and s4embed drift watches embedding drift and recall and recommends re-tuning.
It runs as a standard Amazon Linux 2023 AMI behind your own load balancer, in your own VPC - your data and your vector database never leave your account, and there is no lock-in. The gateway supports API-key auth when configured, request size and concurrency caps, a readiness probe that fails closed on billing or store problems, and Prometheus metrics. Billing is usage-metered through your AWS bill: you pay per text embedded, document indexed, and search served, reported hourly via the AWS Marketplace Metering Service.
Highlights
- Cut vector-search RAM by up to 32x with a measured recall curve: binary quantization + two-stage rescore reached recall@10 of 0.976 to 1.000 on a 30k-vector benchmark depending on store and over-fetch (OpenSearch 0.995, pgvector 0.996, Qdrant 1.000, Milvus 0.976). You tune the operating point to your recall target.
- Decide with data before you cut over: s4embed prove estimates the recall/cost frontier on your vectors, s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus, and a shadow mode replays live traffic against the compressed path - plus drift-aware re-tuning.
- Store-agnostic, runs in your VPC, no lock-in: a standard Amazon Linux 2023 AMI behind your load balancer; usage-metered billing (texts embedded, documents indexed, searches served); your data and your vector database never leave your account.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/unit |
|---|---|---|
Embedded texts | Per text embedded via the /embed endpoint. | $0.0000003 |
Indexed docs | Per document indexed via the /index endpoint. | $0.000001 |
Searches | Per query served via the /search endpoint. | $0.0000003 |
Vendor refund policy
Usage is metered hourly and charges already incurred are generally non-refundable. If you believe you were billed in error, contact aws-support@abyo.net within 30 days and we will investigate and correct any verified billing error.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
64-bit (x86) Amazon Machine Image (AMI)
Amazon Machine Image (AMI)
An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.
Version release notes
Initial release: Vector Search FinOps gateway. Binary/int8 quantization + two-stage rescore over OpenSearch, pgvector, Qdrant, or Milvus (up to 32x less vector RAM; recall@10 0.976 to 1.000 by store and over-fetch on a 30k benchmark); prove/tune/compare/drift FinOps tooling; usage-metered.
Additional details
Usage instructions
Launch the AMI in a private subnet behind an Application Load Balancer; terminate TLS at the ALB and do not expose the gateway port directly to the internet. The gateway listens on TCP 8080 and serves GET /ready for the ALB health check. Configure the vector store kind (opensearch, pgvector, qdrant, or milvus) and endpoint, and an API key (S4EMBED_API_KEY), via instance user-data, tags, or SSM Parameter Store. Grant the instance role aws-marketplace:RegisterUsage and aws-marketplace:MeterUsage so metered billing works. A CloudFormation quick-start that provisions the OpenSearch and pgvector paths is included; Qdrant and Milvus are supported by pointing the gateway at your existing endpoint. For the full configuration reference, contact aws-support@abyo.net .
Support
Vendor support
Email support at aws-support@abyo.net .
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products

