Overview
Jina Embeddings v5 Text Small is the latest generation of Jina AI's open-weight text embedding family, distilling the quality of 4B-parameter teacher models into a sub-1B footprint. With 677M parameters and a 32,768-token context window, it embeds full research papers, legal contracts, and long documents in a single pass, with no chunking required.
The model supports 30+ languages with state-of-the-art retrieval quality (MTEB English average 67.1, multilingual average 65.8). Matryoshka Representation Learning lets you truncate embeddings from 1024 down to 32 dimensions without retraining, trading storage cost for marginal recall loss. Five task-specific LoRA adapters (retrieval.query, retrieval.passage, clustering, classification, and text-matching) let a single deployed model serve diverse downstream workloads.
Highlights
- Long-context multilingual embeddings: 32,768-token window and coverage for 30+ languages let you embed entire documents without chunking and search across language boundaries with state-of-the-art quality.
- Matryoshka dimensions from 32 to 1024: truncate embeddings at inference time to match your storage and latency budget. One model, many deployment profiles; no separate training runs for small-vector use cases.
- Five task-specific LoRA adapters in one model: switch between retrieval.query, retrieval.passage, separation, classification, and text-matching per request. Replace a stack of single-purpose embedders with one endpoint.
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Trust Center
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g5.xlarge Inference (Batch) Recommended | Model inference on the ml.g5.xlarge instance type, batch mode | $2.50 |
ml.g5.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g5.xlarge instance type, real-time mode | $2.50 |
ml.g4dn.xlarge Inference (Batch) | Model inference on the ml.g4dn.xlarge instance type, batch mode | $2.50 |
ml.g4dn.2xlarge Inference (Batch) | Model inference on the ml.g4dn.2xlarge instance type, batch mode | $2.50 |
ml.g4dn.4xlarge Inference (Batch) | Model inference on the ml.g4dn.4xlarge instance type, batch mode | $2.50 |
ml.g4dn.8xlarge Inference (Batch) | Model inference on the ml.g4dn.8xlarge instance type, batch mode | $2.50 |
ml.g4dn.12xlarge Inference (Batch) | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $2.50 |
ml.g4dn.16xlarge Inference (Batch) | Model inference on the ml.g4dn.16xlarge instance type, batch mode | $2.50 |
ml.g6.xlarge Inference (Batch) | Model inference on the ml.g6.xlarge instance type, batch mode | $2.50 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $2.50 |
Vendor refund policy
For support, please visit https://jina.ai/contact-sales or https://www.elastic.co/support .
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Initial release
Additional details
Inputs
- Summary
The model accepts JSON inputs. Texts must be passed in the following format.
- Input MIME type
- text/csv
Support
Vendor support
For support, please visit https://jina.ai/contact-sales or
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




