Listing Thumbnail

    S4 Embed - Vector Search FinOps Gateway

     Info
    Deployed on AWS
    S4 Embed is a Vector Search FinOps layer that makes your existing vector database cheaper while keeping recall on target. It quantizes embeddings (binary + int8) and runs a two-stage search (1-bit Hamming coarse stage + exact rescore) in front of OpenSearch, pgvector, Qdrant, or Milvus, reducing the in-RAM ANN graph by up to 32x. On a 30k-vector benchmark, binary + float rescore reached recall@10 of 0.976 to 1.000 depending on store and over-fetch (Milvus 0.976; OpenSearch, pgvector, Qdrant 0.99+); you choose the operating point to meet your recall target. Runs as an Amazon Linux 2023 AMI in your own VPC, billed per unit of usage (texts embedded, documents indexed, searches served).

    Overview

    S4 Embed is a Vector Search FinOps layer: it helps you find and run a low-cost vector-search configuration that meets your recall target. Quantization (binary for an up-to-32x smaller in-RAM ANN graph, int8 residual for about 4x smaller on-disk vectors) plus a two-stage search (coarse Hamming shortlist then exact rescore) reduce RAM while you tune the over-fetch/rescore operating point to hold recall. On a 30k-vector benchmark, binary Hamming + float rescore reached recall@10 of 0.987 at over-fetch 50 and 0.995-0.996 at over-fetch 100 on OpenSearch and pgvector, 1.000 on Qdrant, and 0.976 on Milvus (tune per workload) - all at the 32x RAM reduction. Recall scales with over-fetch as the corpus grows, so the operating point is chosen per workload. The pipeline is store-agnostic across OpenSearch, pgvector, Qdrant, and Milvus.

    The FinOps tools are the product. s4embed prove estimates the recall/cost/latency frontier on your vectors using the in-process quantization/rescore model and returns a recommended config. s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus for your corpus and queries. s4embed tune emits a deployable config meeting a declared recall + latency + RAM budget. A gateway shadow mode dual-writes and shadow-compares live reads so you can watch the compressed path reproduce your primary before cutting over; use compare plus shadow mode to validate before any cutover, and s4embed drift watches embedding drift and recall and recommends re-tuning.

    It runs as a standard Amazon Linux 2023 AMI behind your own load balancer, in your own VPC - your data and your vector database never leave your account, and there is no lock-in. The gateway supports API-key auth when configured, request size and concurrency caps, a readiness probe that fails closed on billing or store problems, and Prometheus metrics. Billing is usage-metered through your AWS bill: you pay per text embedded, document indexed, and search served, reported hourly via the AWS Marketplace Metering Service.

    Highlights

    • Cut vector-search RAM by up to 32x with a measured recall curve: binary quantization + two-stage rescore reached recall@10 of 0.976 to 1.000 on a 30k-vector benchmark depending on store and over-fetch (OpenSearch 0.995, pgvector 0.996, Qdrant 1.000, Milvus 0.976). You tune the operating point to your recall target.
    • Decide with data before you cut over: s4embed prove estimates the recall/cost frontier on your vectors, s4embed compare measures live ANN recall across OpenSearch, pgvector, Qdrant, and Milvus, and a shadow mode replays live traffic against the compressed path - plus drift-aware re-tuning.
    • Store-agnostic, runs in your VPC, no lock-in: a standard Amazon Linux 2023 AMI behind your load balancer; usage-metered billing (texts embedded, documents indexed, searches served); your data and your vector database never leave your account.

    Details

    Delivery method

    Delivery option
    64-bit (x86) Amazon Machine Image (AMI)

    Latest version

    Operating system
    AmazonLinux 2023

    Deployed on AWS
    New

    Introducing multi-product solutions

    You can now purchase comprehensive solutions tailored to use cases and industries.

    Multi-product solutions

    Features and programs

    Financing for AWS Marketplace purchases

    AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.
    Financing for AWS Marketplace purchases

    Pricing

    S4 Embed - Vector Search FinOps Gateway

     Info
    Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.
    Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator  to estimate your infrastructure costs.

    Usage costs (3)

     Info
    Dimension
    Description
    Cost/unit
    Embedded texts
    Per text embedded via the /embed endpoint.
    $0.0000003
    Indexed docs
    Per document indexed via the /index endpoint.
    $0.000001
    Searches
    Per query served via the /search endpoint.
    $0.0000003

    Vendor refund policy

    Usage is metered hourly and charges already incurred are generally non-refundable. If you believe you were billed in error, contact aws-support@abyo.net  within 30 days and we will investigate and correct any verified billing error.

    How can we make this page better?

    Tell us how we can improve this page, or report an issue with this product.
    Tell us how we can improve this page, or report an issue with this product.

    Legal

    Vendor terms and conditions

    Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

    Content disclaimer

    Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

    Usage information

     Info

    Delivery details

    64-bit (x86) Amazon Machine Image (AMI)

    Amazon Machine Image (AMI)

    An AMI is a virtual image that provides the information required to launch an instance. Amazon EC2 (Elastic Compute Cloud) instances are virtual servers on which you can run your applications and workloads, offering varying combinations of CPU, memory, storage, and networking resources. You can launch as many instances from as many different AMIs as you need.

    Version release notes

    Initial release: Vector Search FinOps gateway. Binary/int8 quantization + two-stage rescore over OpenSearch, pgvector, Qdrant, or Milvus (up to 32x less vector RAM; recall@10 0.976 to 1.000 by store and over-fetch on a 30k benchmark); prove/tune/compare/drift FinOps tooling; usage-metered.

    Additional details

    Usage instructions

    Launch the AMI in a private subnet behind an Application Load Balancer; terminate TLS at the ALB and do not expose the gateway port directly to the internet. The gateway listens on TCP 8080 and serves GET /ready for the ALB health check. Configure the vector store kind (opensearch, pgvector, qdrant, or milvus) and endpoint, and an API key (S4EMBED_API_KEY), via instance user-data, tags, or SSM Parameter Store. Grant the instance role aws-marketplace:RegisterUsage and aws-marketplace:MeterUsage so metered billing works. A CloudFormation quick-start that provisions the OpenSearch and pgvector paths is included; Qdrant and Milvus are supported by pointing the gateway at your existing endpoint. For the full configuration reference, contact aws-support@abyo.net .

    Support

    Vendor support

    Email support at aws-support@abyo.net .

    AWS infrastructure support

    AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

    Similar products

    Customer reviews

    Ratings and reviews

     Info
    0 ratings
    5 star
    4 star
    3 star
    2 star
    1 star
    0%
    0%
    0%
    0%
    0%
    0 reviews
    No customer reviews yet
    Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.