Overview
Takara's DS1 Embedding Model is a high-speed text embedding model that employs static embeddings, eliminating the need for a GPU model for embeddings. This innovative feature significantly reduces computing requirements, while still maintaining an impressive performance. Although DS1 is much faster than standard embeddings like OpenAI, it does not compromise on accuracy, delivering results that are nearly as precise. The DS1 API is fully compatible with OpenAI's API, facilitating a smooth drop-in replacement strategy for application upgrades. DS1 is particularly effective in scenarios where speed is paramount, such as in speech-to-speech applications where latency is critical, or in near-real-time applications like betting and gaming that demand a rapid semantic approach.
Highlights
- Exceptional Speed: Optimized for CPUs, DS1 delivers the same performance as GPU-based models, maintaining quality on par with GPU-based embeddings. With a lLatency of 0.97 ms for a single query with at most 512 tokens. 1,640M tokens per hour at $0.01 per 1M tokens on an ml.c5.2xlarge instance.
- Reduced Dimension & Cost: With an embedding dimension of 512, DS1 is 6-8x smaller compared to OpenAI (3072) and E5 Mistral (4096), resulting in a significant reduction in vectorDB costs.
- Seamless Integration: DS1 serves as a drop-in replacement for OpenAI embeddings, ensuring a smooth transition and upgrade process.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Free trial
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.c5.2xlarge Inference (Batch) Recommended | Model inference on the ml.c5.2xlarge instance type, batch mode | $16.40 |
ml.c5.2xlarge Inference (Real-Time) Recommended | Model inference on the ml.c5.2xlarge instance type, real-time mode | $16.40 |
ml.c5.xlarge Inference (Batch) | Model inference on the ml.c5.xlarge instance type, batch mode | $8.00 |
ml.c5.4xlarge Inference (Batch) | Model inference on the ml.c5.4xlarge instance type, batch mode | $32.80 |
ml.c5.9xlarge Inference (Batch) | Model inference on the ml.c5.9xlarge instance type, batch mode | $73.60 |
ml.c5.18xlarge Inference (Batch) | Model inference on the ml.c5.18xlarge instance type, batch mode | $146.80 |
ml.c5.xlarge Inference (Real-Time) | Model inference on the ml.c5.xlarge instance type, real-time mode | $8.00 |
ml.c5.4xlarge Inference (Real-Time) | Model inference on the ml.c5.4xlarge instance type, real-time mode | $32.80 |
ml.c5.9xlarge Inference (Real-Time) | Model inference on the ml.c5.9xlarge instance type, real-time mode | $73.60 |
ml.c5.18xlarge Inference (Real-Time) | Model inference on the ml.c5.18xlarge instance type, real-time mode | $146.80 |
Vendor refund policy
Refunds are furnished in line with the EULA only. Please contact support@takara.ai for assistance.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
Updated documentation.
Additional details
Inputs
- Summary
The model accepts JSON requests that specifies the input text as s single string or an array to be embedded.
inputs: str or List[str] - Single text or list of texts. truncate: bool, optional (default=False) - True: Truncates. False: raises error if any given text exceeds the context length. truncation_direction: str, optional (default="right") - "right": truncates the right of the string; "left": truncates the left part of input string.
- Limitations for input type
- The maximum tokens for each text is 512, the maximum length of the list is 32.
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
inputs | A string or array of strings for DS1 to embed. The maximum size a string to be embedded is 512. and the maximum number of strings per call is 32. | 512 tokens maximum. | Yes |
truncate | One of True | False to specify how the API will truncate inputs longer than the maximum token length. | Defaults to False | No |
truncation_direction | Determines how truncation of the string happens when truncate is set to True. Passing Left will discard the start of the input. Right will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for DS1. | - | No |
Support
Vendor support
Please email support@takara.ai for customer support for next day response.
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.