Overview
Embed 4 is a multilingual multimodal embedding model. It is capable of transforming different modalities such as images, texts, and interleaved images and texts into a single vector representation. Embed 4 offers state-of-the-art performance across all modalities (texts, images, interleaved texts and image) and in both English and multilingual settings.
Embed 4 offers a variety of ways for compression both on the number of dimensions and the number-format precision. The model offers byte and binary quantization and matryoshka embeddings for further compression.
As of July 2025, the minimum requirements to deploy this model is NVIDIA driver version: 535 and CUDA version: 12.2.
Highlights
- Embed 4 offers State-of-the-art performance in Text-to-Text, Text-to-Image, and Text-to-Mixed Modality domains across 100+ languages.
- Embed 4 is capable of vectorizing interleaved texts and images and capturing key visual features from screenshots of PDFs, slides, tables, figures, and more, thereby eliminating the need for complex document parsing.
- Embed 4 supports a 128k context length and an images can have a maximum of 2MM pixels.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g4dn.12xlarge Inference (Batch) Recommended | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $2.94 |
ml.g6.xlarge Inference (Real-Time) Recommended | Model inference on the ml.g6.xlarge instance type, real-time mode | $2.94 |
ml.g4dn.2xlarge Inference (Real-Time) | Model inference on the ml.g4dn.2xlarge instance type, real-time mode | $2.94 |
ml.g4dn.xlarge Inference (Real-Time) | Model inference on the ml.g4dn.xlarge instance type, real-time mode | $2.94 |
ml.g5.2xlarge Inference (Real-Time) | Model inference on the ml.g5.2xlarge instance type, real-time mode | $2.94 |
ml.g5.xlarge Inference (Real-Time) | Model inference on the ml.g5.xlarge instance type, real-time mode | $2.94 |
ml.p3.2xlarge Inference (Real-Time) | Model inference on the ml.p3.2xlarge instance type, real-time mode | $2.94 |
ml.g6.2xlarge Inference (Real-Time) | Model inference on the ml.g6.2xlarge instance type, real-time mode | $2.94 |
Vendor refund policy
There are no refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
We've updated our SageMaker integration with a major version release for Embed and Rerank models, including notebook updates. The "/invocation" endpoint now defaults to API V2, ensuring a seamless transition to the latest version. Please see the notebook on how to use this model with the API update: https://github.com/cohere-ai/cohere-aws/blob/main/notebooks/sagemaker/Embed%20Models.ipynbÂ
Additional details
Inputs
- Summary
This model accepts JSON requests that specifies a content object which can contain a list of texts, a list of data urls of a base64 encoded images or combination. This model supports interleaved images and texts in the same request. { ""content"": [ { ""type"": ""text"", ""text"": ""Look at my awesome car!"" }, { ""type"": ""image"", ""image"": f""data:image/png;base64,{base64_image}"" }, { ""type"": ""text"", ""text"": ""Do you want to buy it?"" }, ]
- Limitations for input type
- Cohere's embedding models do NOT support batch transform.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
inputs | A list of dicts with the key: “content” which is a list with type which is either text or image_url and then if type='text' then there is a text key with a string. If type=image_url then its a data url formatted base64 encoded image and an image key. The max pixels per image is 2,458,624, the max memory size of a single request is 20mb, and then inputs objects can have at most 96 inputs. The inputs object can have a maximum of 800,000 tokens which is calculated as follows: 1) For each image: image pixels / 784 (pixels per token) = tokens 2) For each text: text tokens 3) Total inputs object = For each input, summation of (1) and (2) The max pixels per image is 2,458,624, the max memory size of a single request is 20mb, and then inputs objects can have at most 96 inputs. | The input data type is categorical. The default value is none. | No |
texts | An array of strings for the model to embed. Maximum number of texts per call is 96. If you are using the texts parameter you cannot use the images parameter in the same call. The input data type is text. | N/A | No |
images | An array of base 64 encoded data url as strings to embed. Maximum number of images per call is 96. You cannot send both an array of texts and images at the same time. | The input data type is text. | No |
input_type | A required field that will prepend special tokens to differentiate each type from one another. The only exception for mixing types would be for search and retrieval, you should embed your corpus with the type search_document and then queries should be embedded with type search_query. | The input data type is categorical. If categorical is chosen: search_document, search_query, classification, and clustering. | Yes |
embeddings_type | Specifies the types of embeddings you want to get back. Not required. If unspecified, returns the float response type. Can be one or more of the types specified in Allowed Values. | The input data type is categorical. If categorical is chosen: float, int8, uint8, binary, and ubinary. | No |
output_dimension | Specifies the length of the output dimensions of the embeddings vector. Not required. if unspecified, returns 1536 dimensions per embedding vector. | Input data type is categorical. If categorical is chosen: {256, 512, 1024, 1536}. The default value is none. | No |
max_tokens | The maximum number of tokens considered for each input object before it is truncated; the default for this model is set at 128,000 tokens. | The input data type is integer. The default value is 8192. | No |
truncate | One of NONE|LEFT|RIGHT to specify how the API will handle inputs longer than the maximum token length. Passing LEFT will discard the start of the input. RIGHT will discard the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If NONE is selected, when the input exceeds the maximum input token length an error will be returned. | The input data type is categorical. The default value is none. | No |
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products




