Amazon Bedrock Pricing
Pricing overview
Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) via a single API, along with a broad set of capabilities you need to build generative AI applications simplifying development with security, privacy, and responsible AI.
With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1/ On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2/ Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.
Pricing models
On-Demand
With the On-Demand mode, you only pay for what you use, with no time-based term commitments. For text generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed. A token is comprised of a few characters and refers to the basic unit of text that a model learns to understand the user input and prompt. For image generation models, you are charged for every image generated.
Batch
With Batch mode, you can provide a set of prompts as a single input file and receive responses as a single output file, allowing you to get simultaneous large-scale predictions. The responses are processed and stored in your Amazon S3 bucket so you can access them at a later time. Pricing for Batch mode is the same as pricing for On-Demand mode.
Provisioned Throughput
With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput. A model unit provides a certain throughput, which is measured by the maximum number of input or output tokens processed per minute. With the Provisioned Throughput pricing, you are charged by the hour, you have the flexibility to choose between 1-month or 6-month commitment terms.
Model customization
With Amazon Bedrock, you can customize FMs with your data to deliver tailored responses for specific tasks and your business context. You can fine-tune models with labeled data or using continued pre-training with unlabeled data. For customization of a text generation model, you are charged for the model training based on the total number of tokens processed by the model (number of tokens in the training data corpus times the number of epochs) and for model storage charged per month per model. An epoch refers to one full pass through your training dataset during fine-tuning or continued pre-training. Inferences using customized models are charged under the Provisioned Throughput plan and requires you purchase Provisioned Throughput. One model unit is made available at no commitment term for inference on a customized model. You will be charged for the number of hours that the first model unit you use for custom model inference. If you want to increase your throughput beyond one model unit, then you must purchase a 1-month or 6-month commitment term.
Powerful tools to build at no extra charge
When using Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock, you are only charged for the models and the vector databases you use with these capabilities.
Pricing breakdown
Pricing is dependent on the modality, provider, and model. Please select the model provider to see detailed pricing.
AI21 Labs
On-Demand and Batch pricing
AI21 Labs models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Jurassic-2 Mid |
$0.0125 |
$0.0125 |
Jurassic-2 Ultra |
$0.0188 |
$0.0188 |
Currently, model customization (fine-tuning) and Provisioned Throughput are not supported for AI21 Labs’ models on Amazon Bedrock.
Amazon
Anthropic
On-Demand and Batch pricing
Region: US East (N. Virginia) and US West (Oregon)
Anthropic models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Claude Instant |
$0.00163 |
$0.00551 |
Claude |
$0.00800 |
$0.02400 |
Region: Asia Pacific (Tokyo)
Anthropic models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Claude Instant |
$0.00223 |
$0.00755 |
Claude |
$0.00800 |
$0.02400 |
Region: Europe (Frankfurt)
Anthropic models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Claude Instant |
$0.00248 |
$0.00838 |
Claude |
$0.00800 |
$0.02400 |
Provisioned Throughput pricing:
Region: US East (N. Virginia) and US West (Oregon)
Anthropic models | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
Claude Instant |
$39.60 |
$22.00 |
Claude |
$63.00 |
$35.00 |
Region: Asia Pacific (Tokyo)
Anthropic models | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
Claude Instant |
$53.10 |
$29.50 |
Claude |
$163.80 |
$91.00 |
Region: Europe (Frankfurt)
Anthropic models | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
Claude Instant |
$58.86 |
$32.70 |
Claude |
$149.40 |
$83.00 |
Please reach out to your AWS account team for more details on model units.
Cohere
On-Demand and Batch pricing
Cohere models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Command | $0.0015 | $0.0020 |
Command-Light | $0.0003 | $0.0006 |
Embed – English | $0.0001 | N/A |
Embed – Multilingual | $0.0001 | N/A |
Pricing for customization (fine-tuning)
Cohere models | Price to train 1,000 tokens | Price to store each custom model per month | Price to infer from a custom model per model unit per hour (with no-commit Provisioned Throughput pricing) |
Cohere Command |
$0.004 |
$1.95 |
$49.50 |
Cohere Command-Light | $0.001 | $1.95 |
$8.56 |
*Total tokens trained = number of tokens in training data corpus x number of epochs
Provisioned Throughput pricing:
Cohere models | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
Cohere Command |
$39.60 |
$23.77 |
Cohere Command-Light | $6.85 |
$4.11 |
Please reach out to your AWS account or sales team for more details on model units.
Meta Llama 2
On-Demand and Batch pricing
Meta models | Price per 1,000 input tokens | Price per 1,000 output tokens |
Llama 2 Chat (13B) |
$0.00075 |
$0.00100 |
Llama 2 Chat (70B) | $0.00195 |
$0.00256 |
Pricing for model customization (fine-tuning)
Meta models | Price to train 1,000 tokens | Price to store each custom model* per month | Price to infer from a custom model for 1 model unit per hour (with no-commit Provisioned Throughput pricing) |
Llama 2 Chat (13B) |
$0.00149 |
$1.95 |
$23.50 |
Llama 2 Chat (70B) | $0.00799 |
$1.95 | $23.50 |
*Custom model storage = $1.95
Provisioned Throughput pricing:
Meta models | Price per hour per model unit for 1-month commitment | Price per hour per model unit for 6-month commitment |
Llama 2 Chat (13B) |
$21.18 |
$13.08 |
Llama 2 Chat (70B) | $21.18 |
$13.08 |
Please reach out to your AWS account or sales team for more details on model units.
Stability AI
On-Demand and Batch pricing
Image models offered by Stability AI are priced per image, depending on step count and image resolution:
Stability AI model | Image resolution | Price per image generated for standard quality (<=50 steps) | Price per image generated for premium quality (>50 steps) |
SDXL 0.8 |
512 x 512 or smaller |
$0.018 per image |
$0.036 per image |
Larger than 512 x 512 |
$0.036 per image |
$0.072 per image |
|
SDXL 1.0 | Up to 1024 x 1024 |
$0.04 | $0.08 |
Provisioned Throughput pricing:
Stability AI model | Price per hour per model unit for 1-month commitment* | Price per hour per model unit for 6-month commitment* |
SDXL 1.0 |
$49.86 |
$46.18 |
*Includes inference for base and custom models.
Please reach out to your AWS account or sales team for more details on model units.
Currently, model customization (fine-tuning) is not supported for Stability AI models on Amazon Bedrock.
Pricing examples
-
AI21 labs
An application developer makes the following API calls to Amazon Bedrock: A request to AI21’s Jurrasic-2 Mid model to summarize an input of 10K tokens of input text to an output of 2K tokens.
Total cost incurred is = 10K tokens/1000 * $0.0125 + 2K tokens/1000 * $0.0125 = $0.15
-
Amazon
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock on an hourly basis: A request to Amazon Titan Text – Lite model to summarize an input of 2K tokens of input text to an output of 1K tokens.
Total hourly cost incurred is = 2K tokens/1000 * $0.0003 + 1K tokens/1000 * $0.0004 = $0.001.
An application developer makes the following API calls to Amazon Bedrock: A request to Titan Image Generator base model to generate 1000 images of 1024 x 1024 in size of standard quality
Total cost incurred = 1000 images * $0.01 per image = $10
Customization (fine-tuning and continued pre-training) pricing
An application developer customizes a Titan Image Generation model using 1000 image-text pairs. After training, the developer uses custom model provisioned throughput for one hour to evaluate the performance of the model. The fine-tuned model is stored for one month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.
Monthly cost incurred for fine-tuning is: Fine tuning training ($.005 * 1000) + custom model storage per month ($1.95) + one hour of custom model inference ($21) = $5 + $1.95 + 21 = $27.95
Provisioned Throughput pricing
An application developer, buys two model units of Titan Text Express with 1-month commitment for their text summarization use case.
Total monthly cost incurred is = 2 model units * $18.40/hour * 24 hours * 31 days = $27,379.20
An application developer buys one model unit of the base Titan Image Generator model with 1-month commitment.
Total cost incurred = 1 * $16.20 * 24 hours * 31 days = $12,052.80
-
Anthropic
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock in US West (Oregon) region: A request to Anthropic’s Claude model to summarize an input of 11K tokens of input text to an output of 4K tokens.
Total cost incurred is 11K tokens/1000 * $0.008 + 4K tokens/1000 * $0.024 = $0.088 + $0.096 = $0.184
Provisioned Throughput pricing
An application developer buys one model unit of Anthropic Claude Instant in US West (Oregon) region:
Total monthly cost incurred is 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
-
Cohere
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Command model to summarize an input of 6K tokens of input text to an output of 2K tokens.
Total cost incurred is = 6K tokens/1,000 * $0.00150 + 2K tokens/1,000 * $0.0020 = $0.013
An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Command-Light model to summarize an input of 6K tokens of input text to an output of 2K tokens.
Total cost incurred is = 6K tokens/1000 * $0.0003 + 2K tokens/1000 * $0.0006 = $0.003
An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Embed English or multilingual model to generate embeddings for 10K tokens of input.
Total cost incurred is = 10K tokens/1000 * $0.0001 = $.001
Customization (fine-tuning) pricing
An application developer customizes a Cohere command model using 1000 tokens of data. After training, uses custom model provisioned throughput for one hour to evaluate the performance of the model. The fine-tuned model is stored for one month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.
Monthly cost incurred for fine-tuning is: Fine tuning training ($0.004 * 1000) + custom model storage per month ($1.95) + one hour of custom model inference ($49.50) = $55.45
Monthly cost incurred for provisioned throughput (1-mo commit) of custom model = $39.60
Provisioned Throughput pricing
An application developer, buys one model unit of Cohere Command with 1-month commitment for their text summarization use case.
Total monthly cost incurred is 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
-
Meta
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: A request to Meta’s Llama 2 Chat (13B) model to summarize an input of 2K tokens of input text to an output of 500 tokens.
Total cost incurred is = 2K tokens/1000 * $0.00075 + 500 tokens/1000 * $0.001 = $0.002
Customization (fine-tuning) pricing
An application developer customizes the Llama 2 Pre-trained (70B) model using 1000 tokens of data. After training, uses custom model provisioned throughput for one hour to evaluate the performance of the model. The fine-tuned model is stored for one month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.
Monthly cost incurred for fine-tuning is: Fine tuning training ($0.00799 * 1000) + custom model storage per month ($1.95) + one hour of custom model inference ($23.50) = $33.44
Monthly cost incurred for provisioned throughput (1-mo commit) of custom model = $21.18
Provisioned Throughput pricing
An application developer, buys one model unit of Meta Llama 2 with 1-month commitment for their text summarization use case.
Total monthly cost incurred is 1 model unit * $21.20 * 24 hours * 31 days = $15,772.8
-
Stability AI
On-Demand pricing
An application developer makes the following API calls to Amazon Bedrock: A request to SDXL model to generate an image of 512 x 512 in size of step size of 70 (premium quality)
Total cost incurred = 1 image * $0.036 per image = $0.036
An application developer makes the following API calls to Amazon Bedrock: A request to SDXL1.0 model to generate an image of 1024 x 1024 in size of step size of 70 (premium quality)
Total cost incurred = 1 image * $0.08 per image = $0.08
Provisioned Throughput pricing
An application developer buys one model unit of SDXL1.0 with 1-month commitment.
Total cost incurred = 1 * $49.86 * 24 hours * 31 days = $37,095.84