Overview
ZeroEntropy builds Artificial Specialized Intelligence: task-specific models that make repetitive AI workflows faster, cheaper, and more accurate than general frontier LLMs. zembed-1 converts queries and documents into high-quality vector representations that help retrieval systems surface the most relevant context, reduce noise, and improve downstream answer quality.
Built for production AWS deployments, zembed-1 runs as a SageMaker model package in your own account for private real-time inference or batch embedding jobs. It is designed to pair naturally with ZeroEntropy rerankers, while also working as a standalone embedding model for semantic search, hybrid retrieval, clustering, deduplication, and knowledge-base indexing.
Highlights
- High-accuracy embeddings for semantic retrieval across legal, manufacturing, financial, medical, STEM, conversational, and code search.
- Built for RAG and AI agents - converts queries and documents into vectors that improve recall before reranking or generation.
- Enterprise-ready SageMaker deployment with private inference, batch transform support, and transparent per-instance pricing.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.2xlarge Inference (Batch) Recommended | Model inference on the ml.g5.2xlarge instance type, batch mode | $6.06 |
ml.g6e.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g6e.xlarge instance type, real-time mode | $10.422 |
ml.g5.xlarge Inference (Batch) | Model inference on the ml.g5.xlarge instance type, batch mode | $5.632 |
ml.g5.12xlarge Inference (Batch) | Model inference on the ml.g5.12xlarge instance type, batch mode | $28.36 |
ml.g6.xlarge Inference (Real-Time) | Model inference on the ml.g6.xlarge instance type, real-time mode | $4.507 |
ml.g6.2xlarge Inference (Real-Time) | Model inference on the ml.g6.2xlarge instance type, real-time mode | $4.888 |
ml.g6.4xlarge Inference (Real-Time) | Model inference on the ml.g6.4xlarge instance type, real-time mode | $6.616 |
ml.g6.8xlarge Inference (Real-Time) | Model inference on the ml.g6.8xlarge instance type, real-time mode | $10.072 |
ml.g6.12xlarge Inference (Real-Time) | Model inference on the ml.g6.12xlarge instance type, real-time mode | $23.008 |
ml.g6.16xlarge Inference (Real-Time) | Model inference on the ml.g6.16xlarge instance type, real-time mode | $16.984 |
Vendor refund policy
Please contact support@zeroentropy.dev or ping us on our community Slack/Discord at https://go.zeroentropy.dev/slack and https://go.zeroentropy.dev/discord
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial release of zembed-1.
Additional details
Inputs
- Summary
The input to the embedding model is an embedding_type and a batch of text inputs to embed. embedding_type should be query or document. Example: { "embedding_type": "query", "input": [""], "dimensions": 1280 }
- Limitations for input type
- There is a limit of 1024 inputs per request and 5MB per request as measured by UTF-8 bytes. Individual input strings may be truncated to 16384 bytes. dimensions is optional.
- Input MIME type
- application/json
Support
Vendor support
Contact support@zeroentropy.dev
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.