S4 Embed - Vector Search FinOps Gateway

S4 Embed is a Vector Search FinOps layer that makes your existing vector database cheaper while keeping recall on target. It quantizes embeddings (binary + int8) and runs a two-stage search (1-bit Hamming coarse stage + exact rescore) in front of OpenSearch, pgvector, Qdrant, or Milvus, reducing the in-RAM ANN graph by up to 32x. On a 30k-vector benchmark, binary + float rescore reached recall@10 of 0.976 to 1.000 depending on store and over-fetch (Milvus 0.976; OpenSearch, pgvector, Qdrant 0.99+); you choose the operating point to meet your recall target. Runs as an Amazon Linux 2023 AMI in your own VPC, billed per unit of usage (texts embedded, documents indexed, searches served).

View purchase options

Try for free

Overview

Try agent mode

Create proposal

Ask question

S4 Embed is a Vector Search FinOps layer: it helps you find and run a low-cost vector-search configuration that meets your recall target. Quantization (binary for an up-to-32x smaller in-RAM ANN graph, int8 residual for about 4x smaller on-disk vectors) plus a two-stage search (coarse Hamming shortlist then exact rescore) reduce RAM while you tune the over-fetch/rescore operating point to hold recall. On a 30k-vector benchmark, binary Hamming + float rescore reached recall@10 of 0.987 at over-fetch 50 and 0.995-0.996 at over-fetch 100 on OpenSearch and pgvector, 1.000 on Qdrant, and 0.976 on Milvus (tune per workload) - all at the 32x RAM reduction. Recall scales with over-fetch as the corpus grows, so the operating point is chosen per workload. The pipeline is store-agnostic across OpenSearch, pgvector, Qdrant, and Milvus.

The FinOps tools are the product. s4embed prove estimates the recall/cost/latency frontier on your vectors using the in-process quantization/rescore model and returns a recommended config. s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus for your corpus and queries. s4embed tune emits a deployable config meeting a declared recall + latency + RAM budget. A gateway shadow mode dual-writes and shadow-compares live reads so you can watch the compressed path reproduce your primary before cutting over; use compare plus shadow mode to validate before any cutover, and s4embed drift watches embedding drift and recall and recommends re-tuning.

It runs as a standard Amazon Linux 2023 AMI behind your own load balancer, in your own VPC - your data and your vector database never leave your account, and there is no lock-in. The gateway supports API-key auth when configured, request size and concurrency caps, a readiness probe that fails closed on billing or store problems, and Prometheus metrics. Billing is usage-metered through your AWS bill: you pay per text embedded, document indexed, and search served, reported hourly via the AWS Marketplace Metering Service.

Highlights

Cut vector-search RAM by up to 32x with a measured recall curve: binary quantization + two-stage rescore reached recall@10 of 0.976 to 1.000 on a 30k-vector benchmark depending on store and over-fetch (OpenSearch 0.995, pgvector 0.996, Qdrant 1.000, Milvus 0.976). You tune the operating point to your recall target.
Decide with data before you cut over: s4embed prove estimates the recall/cost frontier on your vectors, s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus, and a shadow mode replays live traffic against the compressed path - plus drift-aware re-tuning.
Store-agnostic, runs in your VPC, no lock-in: a standard Amazon Linux 2023 AMI behind your load balancer; usage-metered billing (texts embedded, documents indexed, searches served); your data and your vector database never leave your account.

Details

Sold by

abyo software

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Free trial

Try for free

Try this product free for 14 days according to the free trial terms set by the vendor. Usage-based pricing is in effect for usage beyond the free trial terms. Your free trial gets automatically converted to a paid subscription when the trial ends, but may be canceled any time before that.

S4 Embed - Vector Search FinOps Gateway

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (3)

Info

Dimension	Description	Cost/unit
Embedded texts	Per text embedded via the /embed endpoint.	$0.0000003
Indexed docs	Per document indexed via the /index endpoint.	$0.000001
Searches	Per query served via the /search endpoint.	$0.0000003

Vendor refund policy

Usage is metered hourly and charges already incurred are generally non-refundable. If you believe you were billed in error, contact aws-support@abyo.net within 30 days and we will investigate and correct any verified billing error.

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery method

Version

Delivery details

64-bit (x86) Amazon Machine Image (AMI)

Amazon Machine Image (AMI)

An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

Version release notes

Adds a CloudFormation Quick Launch delivery option. Software identical to version 1.0.1.

Additional details

Usage instructions

Launch the AMI in a private subnet behind an Application Load Balancer; terminate TLS at the ALB and do not expose the gateway port directly to the internet. The gateway listens on TCP 8080 and serves GET /ready for the ALB health check. Configure the vector store kind (opensearch, pgvector, qdrant, or milvus) and endpoint, and an API key (S4EMBED_API_KEY), via instance user-data, tags, or SSM Parameter Store. Grant the instance role aws-marketplace:MeterUsage so metered billing works. A CloudFormation quick-start that provisions the OpenSearch and pgvector paths is included; Qdrant and Milvus are supported by pointing the gateway at your existing endpoint. For the full configuration reference, contact aws-support@abyo.net .

Support

Vendor support

Email support at aws-support@abyo.net .

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

S4 - Squished S3: CPU S3 Compression Gateway (EC2 AMI)

By abyo software

Self-contained EC2 AMI of the S4 transparent S3 compression gateway with CPU codecs (zstd / gzip) preinstalled. Launch on any general-purpose or compute-optimized instance (t3 / m6i / m7i / c6i / c7i), point your S3 clients at it, and cut S3 storage bytes 50-80 percent for compressible data with zero application changes.

View product

S4 - Squished S3: GPU S3 Compression Gateway (EC2 AMI)

By abyo software

Self-contained EC2 AMI of the S4 transparent S3 compression gateway with NVIDIA nvCOMP GPU codecs preinstalled. Launch on a GPU instance (g4dn / g5 / g6), point your S3 clients at it, and cut S3 storage bytes 50-80 percent for compressible data with zero application changes.

View product

S4 - Squished S3: Transparent S3 Compression Gateway

By abyo software

Drop-in S3-compatible gateway that transparently compresses every object (CPU zstd or GPU nvCOMP), cutting S3 storage bytes 50-80 percent for compressible data with zero application changes. Includes pre-deployment savings estimation and measured-savings reporting.

View product

S4 - Squished S3: Compression Gateway (Metered Savings)

By abyo software

Drop-in S3-compatible gateway that transparently compresses every object, cutting S3 storage bytes 50-80 percent for compressible data with zero application changes. This edition bills by measured savings: you pay per GB of backend storage avoided, per hour, at roughly one third of the avoided storage cost.

View product

S4 Metrics Commercial

By abyo software

Cut your CloudWatch custom-metric bill: govern metric cardinality at ingest, then auto-baseline and roll up savings across your whole AWS Organization.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.