Command A (H100)

Info

Sold by: Cohere

Command A is a highly efficient generative model that excels at agentic and multilingual use cases.

View purchase options

Overview

Try agent mode

Create proposal

Ask question

Command A is Cohere's flagship generative model, optimized for companies that require fast, secure, and highly-performant AI solutions. Command A delivers maximum performance with minimal hardware costs when compared to leading proprietary and open-weights models, such as GPT-4o and DeepSeek-V3. For private deployments, Command A excels on business-critical agentic and multilingual tasks, and can be deployed on just 2 GPUs, compared to competitive models that typically require as many as 32 GPUs. In head-to-head human evaluation across business, STEM, and coding tasks, Command A matches or outperforms its larger and slower competitors, while offering superior throughput and increased efficiency.

Highlights

Command A is very effective at adapting in real time and solving multiple step problems based on context and the objectives it is given. This unlocks various points of automation and assistance for businesses across the globe.
Command A is highly balanced. It performs incredibly well at critical uses cases without sacrificing performance in essential areas, making it a very good general model. This balanced starting point also makes Command A particularly apt to being customized and fine tuned to specific use cases as needed.
Typically such a powerful model would be incredibly expensive to serve, but Command A is incredibly efficient, operating on 2 Ax100s or Hx100s.

Details

Sold by

Cohere

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

Command A (H100)

Info

View purchase options

Pricing is based on actual usage, with charges varying according to how much you consume. Subscriptions have no end date and may be canceled any time.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

Usage costs (2)

Info

Dimension	Description	Cost/host/hour
ml.g4dn.12xlarge Inference (Batch) Recommended	Model inference on the ml.g4dn.12xlarge instance type, batch mode	$34.25
ml.p5.48xlarge Inference (Real-Time) Recommended	Model inference on the ml.p5.48xlarge instance type, real-time mode	$34.25

Vendor refund policy

There are no refunds.

How can we make this page better?

We'd like to hear your feedback and ideas on how to improve this page.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Version

Delivery details

Amazon SageMaker model

An Amazon SageMaker model package is a pre-trained machine learning model ready to use without additional training. Use the model package to create a model on Amazon SageMaker for real-time inference or batch processing. Amazon SageMaker is a fully managed platform for building, training, and deploying machine learning models at scale.

Deploy the model on Amazon SageMaker AI using the following options:

Real-time inference

Deploy the model as an API endpoint for your applications. When you send data to the endpoint, SageMaker processes it and returns results by API response. The endpoint runs continuously until you delete it. You're billed for software and SageMaker infrastructure costs while the endpoint runs. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Deploy models for real-time inference .

Batch transform

Deploy the model to process batches of data stored in Amazon Simple Storage Service (Amazon S3). SageMaker runs the job, processes your data, and returns results to Amazon S3. When complete, SageMaker stops the model. You're billed for software and SageMaker infrastructure costs only during the batch job. Duration depends on your model, instance type, and dataset size. AWS Marketplace models don't support Amazon SageMaker Asynchronous Inference. For more information, see Batch transform for inference with Amazon SageMaker AI .

Version release notes

We've updated our SageMaker integration with a major version release for Embed and Rerank models, including notebook updates. The "/invocation" endpoint now defaults to API V2, ensuring a seamless transition to the latest version. Please see the notebook on how to use this model with the API update: https://github.com/cohere-ai/cohere-aws/blob/main/notebooks/sagemaker/Command%20Models.ipynb

New Features: API Version Control: Users can now specify the API version (v1 or v2) in the endpoint URL, providing greater flexibility and control over API interactions. Bug Fixes: Billing Token Issue: Resolved an issue where billing tokens were consistently returning as 0 for embed requests. Image Processing Error: Addressed a problem where the inference server failed to process valid base64 image URIs, resulting in "failed to parse image" errors. This issue was specific to the inference server and did not affect other routes.

Additional details

Inputs

Summary: The model accepts JSON requests with parameters that can be used to control the generated text. See examples and fields descriptions below.

Input MIME type: application/json

Real-time inference sample input data

{ ""messages"" = [ {""role"": ""SYSTEM"", ""content"": ""You are a helpful assistant""}, {""role"": ""USER"", ""content"": ""What is an interesting new role in AI if I don't have an ML background?""}, {""role"": ""ASSISTANT"", ""content"": ""You could explore being a prompt engineer!""}, {""role"": ""USER"", ""content"": ""What are some skills I should have?""}, ], }

Batch transform sample input data

N/A

Input data descriptions

The following table describes supported input data fields for real-time inference and batch transform.

Field name	Description	Constraints	Required
message	A list of chat messages in chronological order, representing a conversation between the user and the model. The data input type is text. Messages can be from User, Assistant, Tool and System roles.	-	Yes
documents	A list of relevant documents that the model can cite to generate a more accurate reply. Each document is either a string or document object with content and metadata. The input data type is text.	-	No
tools	A list of available tools (functions) that the model may suggest invoking before producing a text response. When tools is passed (without tool_results), the text content in the response will be empty and the tool_calls field in the response will be populated with a list of tool calls that need to be made. If no calls need to be made, the tool_calls array will be empty. The input data type is text.	-	No
citation_options	Options for controlling citation generation. Input data type is text.	-	No
logprobs	Defaults to false. When set to true, the log probabilities of the generated tokens will be included in the response. Input data type is categorical. If categorical is chosen then TRUE, FALSE. Default value is FALSE.	-	No
stop_sequences	A list of up to 5 strings that the model will use to stop generation. If the model generates a string that matches any of the strings in the list, it will stop generating tokens and return the generated text up to that point not including the stop sequence. Input data type is text.	-	No
strict_tools	When set to true, tool calls in the Assistant message will be forced to follow the tool definition strictly. Note: The first few requests with a new set of tools will take longer to process.	Input data type is categorical. If categorical is chosen then TRUE, FALSE	No
tool_choice	Allowed values: REQUIRED, NONE Used to control whether or not the model will be forced to use a tool when answering. When REQUIRED is specified, the model will be forced to use at least one of the user-defined tools, and the tools parameter must be passed in the request. When NONE is specified, the model will be forced not to use one of the specified tools, and give a direct response. If tool_choice isn’t specified, then the model is free to choose whether to use the specified tools or not.	If Categorical is chosen: REQUIRED, NONE	No
stream	Defaults to false. When true, the response will be a SSE stream of events. Streaming is beneficial for user interfaces that render the contents of the response piece by piece, as it gets generated.	Input data type is Categorical. If Categorical is chosen then TRUE, FALSE.	No
safety_mode	Allowed values: CONTEXTUAL STRICT OFF Used to select the safety instruction inserted into the prompt. Defaults to CONTEXTUAL. When OFF is specified, the safety instruction will be omitted. Safety modes are not yet configurable in combination with tools, tool_results and documents parameters.	-	No

Resources

Vendor resources

Cohere Blog

Cohere Command Model Information

Cohere SDK

Support

Vendor support

product-feedback@cohere.com

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

Cohere Command R (H100)

By Cohere

Command R is a generative language model optimized for long-context tasks and large scale production workloads.

View product

Cohere Command R+ (H100)

By Cohere

Command R+ is a generative language model optimized for long-context tasks and large scale production workloads.

View product

Command R+ 08-2024 (H100)

By Cohere

Command R+ 08-2024 is a highly performant generative language model optimized for large scale production workloads.

View product

Command R 08-2024 (H100)

By Cohere

Command R 08-2024 is a generative language model optimized for large scale production workloads.

View product

Customer reviews

Leave a review

Ratings and reviews

Info

0 ratings

5 star

4 star

3 star

2 star

1 star

0 reviews

No customer reviews yet

Be the first to review this product . We've partnered with PeerSpot to gather customer feedback. You can share your experience by writing or recording a review, or scheduling a call with a PeerSpot analyst.