Overview
Command A is Cohere's flagship generative model, optimized for companies that require fast, secure, and highly-performant AI solutions. Command A delivers maximum performance with minimal hardware costs when compared to leading proprietary and open-weights models, such as GPT-4o and DeepSeek-V3. For private deployments, Command A excels on business-critical agentic and multilingual tasks, and can be deployed on just 2 GPUs, compared to competitive models that typically require as many as 32 GPUs. In head-to-head human evaluation across business, STEM, and coding tasks, Command A matches or outperforms its larger and slower competitors, while offering superior throughput and increased efficiency.
Highlights
- Command A is very effective at adapting in real time and solving multiple step problems based on context and the objectives it is given. This unlocks various points of automation and assistance for businesses across the globe.
- Command A is highly balanced. It performs incredibly well at critical uses cases without sacrificing performance in essential areas, making it a very good general model. This balanced starting point also makes Command A particularly apt to being customized and fine tuned to specific use cases as needed.
- Typically such a powerful model would be incredibly expensive to serve, but Command A is incredibly efficient, operating on 2 Ax100s or Hx100s.
Details
Unlock automation with AI agent solutions

Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/host/hour |
|---|---|---|
ml.g4dn.12xlarge Inference (Batch) Recommended | Model inference on the ml.g4dn.12xlarge instance type, batch mode | $34.25 |
ml.p5.48xlarge Inference (Real-Time) Recommended | Model inference on the ml.p5.48xlarge instance type, real-time mode | $34.25 |
Vendor refund policy
There are no refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
Amazon SageMaker model
An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.
Version release notes
We've updated our SageMaker integration with a major version release for Embed and Rerank models, including notebook updates. The "/invocation" endpoint now defaults to API V2, ensuring a seamless transition to the latest version. Please see the notebook on how to use this model with the API update: https://github.com/cohere-ai/cohere-aws/blob/main/notebooks/sagemaker/Command%20Models.ipynbÂ
New Features: API Version Control: Users can now specify the API version (v1 or v2) in the endpoint URL, providing greater flexibility and control over API interactions. Bug Fixes: Billing Token Issue: Resolved an issue where billing tokens were consistently returning as 0 for embed requests. Image Processing Error: Addressed a problem where the inference server failed to process valid base64 image URIs, resulting in "failed to parse image" errors. This issue was specific to the inference server and did not affect other routes.
Additional details
Inputs
- Summary
The model accepts JSON requests with parameters that can be used to control the generated text. See examples and fields descriptions below.
- Input MIME type
- application/json
Input data descriptions
The following table describes supported input data fields for real-time inference and batch transform.
Field name | Description | Constraints | Required |
|---|---|---|---|
message | A list of chat messages in chronological order, representing a conversation between the user and the model. The data input type is text. Messages can be from User, Assistant, Tool and System roles. | - | Yes |
documents | A list of relevant documents that the model can cite to generate a more accurate reply. Each document is either a string or document object with content and metadata. The input data type is text. | - | No |
tools | A list of available tools (functions) that the model may suggest invoking before producing a text response. When tools is passed (without tool_results), the text content in the response will be empty and the tool_calls field in the response will be populated with a list of tool calls that need to be made. If no calls need to be made, the tool_calls array will be empty. The input data type is text. | - | No |
citation_options | Options for controlling citation generation. Input data type is text. | - | No |
logprobs | Defaults to false. When set to true, the log probabilities of the generated tokens will be included in the response. Input data type is categorical. If categorical is chosen then TRUE, FALSE. Default value is FALSE. | - | No |
stop_sequences | A list of up to 5 strings that the model will use to stop generation. If the model generates a string that matches any of the strings in the list, it will stop generating tokens and return the generated text up to that point not including the stop sequence. Input data type is text. | - | No |
strict_tools | When set to true, tool calls in the Assistant message will be forced to follow the tool definition strictly. Note: The first few requests with a new set of tools will take longer to process. | Input data type is categorical. If categorical is chosen then TRUE, FALSE | No |
tool_choice | Allowed values: REQUIRED, NONE Used to control whether or not the model will be forced to use a tool when answering. When REQUIRED is specified, the model will be forced to use at least one of the user-defined tools, and the tools parameter must be passed in the request. When NONE is specified, the model will be forced not to use one of the specified tools, and give a direct response. If tool_choice isn’t specified, then the model is free to choose whether to use the specified tools or not. | If Categorical is chosen: REQUIRED, NONE | No |
stream | Defaults to false. When true, the response will be a SSE stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated. | Input data type is Categorical. If Categorical is chosen then TRUE, FALSE. | No |
safety_mode | Allowed values: CONTEXTUAL STRICT OFF Used to select the safety instruction inserted into the prompt. Defaults to CONTEXTUAL. When OFF is specified, the safety instruction will be omitted. Safety modes are not yet configurable in combination with tools, tool_results and documents parameters. | - | No |
Resources
Vendor resources
Support
Vendor support
AWS infrastructure support
AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.
Similar products



