Amazon Bedrock pricing

Pricing overview

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.

Pricing models

On-Demand

With the On-Demand mode, you only pay for what you use, with no time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed. A token comprises a few characters and refers to the basic unit of text that a model learns to understand the user input and prompt. For image-generation models, you are charged for every image generated. Cross-region inference: On-Demand mode also supports cross-region inference for some models. It enables developers to seamlessly manage traffic bursts by utilizing compute across different AWS Regions and get higher throughput limits and enhanced resilience. There's no additional charge for using cross-region inference and the price is calculated basis the region you made the request in (source region).

Provisioned Throughput

With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput. A model unit provides a certain throughput, which is measured by the maximum number of input or output tokens processed per minute. With the Provisioned Throughput pricing, you are charged by the hour, you have the flexibility to choose between 1-month or 6-month commitment terms.

Model customization

With Amazon Bedrock, you can customize FMs with your data to deliver tailored responses for specific tasks and your business context. You can fine-tune models with labeled data or using continued pretraining with unlabeled data. For customization of a text-generation model, you are charged for the model training based on the total number of tokens processed by the model (number of tokens in the training data corpus x the number of epochs) and for model storage charged per month per model. An epoch refers to one full pass through your training dataset during fine-tuning or continued pretraining. Inferences using customized models are charged under the Provisioned Throughput plan and requires you purchase Provisioned Throughput. One model unit is made available with no commitment term for inference on a customized model. You will be charged for the number of hours you use in the first model unit for custom model inference. If you want to increase your throughput beyond one model unit, then you must purchase a 1-month or 6-month commitment term.

Batch

With Batch mode, you can provide a set of prompts as a single input file and receive responses as a single output file, allowing you to get simultaneous large-scale predictions. The responses are processed and stored in your Amazon S3 bucket so you can access them at a later time. Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.

Model evaluation

With model evaluation on Amazon Bedrock you pay for what you use, with no volume commitments on the number of prompts or responses. For automatic evaluation, you only pay for the inference from your choice of model in the evaluation. The automatically-generated algorithmic scores are provided at no extra charge. For human-based evaluation where you bring your own workteam, you are charged for the model inference in the evaluation, and a charge of $0.21 per completed human task. A human task is defined as an instance of a human worker submitting an evaluation of a single prompt and its associated inference responses in the human evaluation user interface. The price is the same whether you have one or two models in your evaluation job and also the same regardless of how many evaluation metrics and rating methods you include. The charges for the human tasks will appear under the Amazon SageMaker section in your AWS bill and are the same for all AWS Regions. There is no separate charge for the workforce, as the workforce is supplied by you. For an evaluation managed by AWS, pricing is customized for your evaluation needs in a private engagement while working with the AWS expert evaluations team.

Powerful tools to build at no extra charge

When using Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases, you are only charged for the models and the vector databases you use with these capabilities.

Pricing breakdown

Pricing is dependent on the modality, provider, and model. Please select the model provider to see detailed pricing.

Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.

AI21 Labs

On-Demand pricing

AI21 Labs models Price per 1,000 input tokens Price per 1,000 output tokens

Jamba 1.5 Large

$0.002

$0.008

Jamba 1.5 Mini

$0.0002

$0.0004

Jurassic-2 Mid

$0.0125

$0.0125

Jurassic-2 Ultra

$0.0188

$0.0188

Jamba-Instruct

$0.0005

$0.0007

Amazon

Anthropic

On-Demand pricing

Region: US East (N. Virginia) and US West (Oregon)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3.5 Sonnet

$0.003

$0.015

Claude 3 Opus*

$0.015

$0.075

Claude 3 Haiku

$0.00025

$0.00125

Claude 3 Sonnet

$0.003

$0.015

Claude 2.1

$0.008

$0.024

Claude 2.0

$0.008

$0.024

Claude Instant

$0.0008

$0.0024

*Claude 3 Opus is currently available in the US West (Oregon) Region

Region: EU (London)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: South America (Sao Paolo)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: Canada (Central)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: Asia Pacific (Mumbai)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: Asia Pacific (Sydney)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: Asia Pacific (Tokyo)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude Instant

$0.0008

$0.0024

Claude 2.0/2.1

$0.008

$0.024

Claude 3 Haiku

$0.00025

$0.00125

Claude 3.5 Sonnet

$0.003

$0.015

Region: Asia Pacific (Singapore)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude Instant

$0.0008

$0.0024

Claude 2.0/2.1

$0.008

$0.024

Claude 3 Haiku

$0.00025

$0.00125

Claude 3.5 Sonnet

$0.003

$0.015

Region: Europe (Paris)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude 3 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Region: Europe (Frankfurt)

Anthropic models Price per 1,000 input tokens Price per 1,000 output tokens

Claude Instant

$0.0008

$0.0024

Claude 2.0/2.1

$0.008

$0.024

Claude 3 Sonnet

$0.003

$0.015

Claude 3.5 Sonnet

$0.003

$0.015

Claude 3 Haiku

$0.00025

$0.00125

Provisioned Throughput pricing

Region: US East (N. Virginia) and US West (Oregon)

Anthropic models Price per hour per model with
no commitment
Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment

Claude Instant

$44.00

$39.60

$22.00

Claude 2.0/2.1

$70.00

$63.00

$35.00

Anthropic models Price per hour per model with
no commitment
Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment

Claude Instant

$44.00

$39.60

$22.00

Claude 2.0/2.1

$70.00

$63.00

$35.00

Region: Asia Pacific (Tokyo)

Anthropic models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment

Claude Instant

$53.00

$29.00

Claude 2.0/2.1

$86.00

$48.00

Region: Europe (Frankfurt)

Anthropic models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment

Claude Instant

$49.00

$27.00

Claude 2.0/2.1

$79.00

$44.00

Please reach out to your AWS account team for more details on model units. 

Cohere

On-Demand pricing

Cohere models Price per 1,000 input tokens Price per 1,000 output tokens
Command $0.0015 $0.0020
Command-Light $0.0003 $0.0006
Command R+ $0.0030 $0.0150
Command R $0.0005 $0.0015
Embed - English $0.0001 N/A
Embed - Multilingual $0.0001 N/A

Pricing for customization (fine-tuning)

Cohere models Price to train 1,000 tokens Price to store each custom model per month Price to infer from a custom model per model unit per hour (with no-commit Provisioned Throughput pricing)

Cohere Command

$0.004

$1.95

$49.50

Cohere Command-Light $0.001

$1.95

$8.56

*Total tokens trained = number of tokens in training data corpus x number of epochs

Provisioned Throughput pricing

Cohere models Price per hour per model 
with no commitment
Price per hour per model unit for 1-month commitment

Price per hour per model unit for 6-month commitment

Cohere Command

$49.50

$39.60

$23.77

Cohere Command - Light $8.56

$6.85

$4.11
Embed - English $7.12

$6.76

$6.41
Embed - Multilingual $7.12

$6.76

$6.41

Please reach out to your AWS account or sales team for more details on model units. 

Meta Llama

Llama 3.2

On-Demand pricing 

Llama 3.1

On-Demand pricing 

Llama 3

On-Demand pricing 

Llama 2

On-Demand pricing 

Region: US East (N. Virginia) and US West (Oregon)

Meta models Price per 1,000 input tokens Price per 1,000 output tokens

Llama 2 Chat (13B)

$0.00075

$0.001

Llama 2 Chat (70B)

$0.00195

$0.00256

Pricing for model customization (fine-tuning)

Meta models Price to train 1,000 tokens Price to store each custom model* per month Price to infer from a custom model for 1 model unit per hour (with no-commit Provisioned Throughput pricing)

Llama 2 Pretrained (13B)

$0.00149

$1.95

$23.50

Llama 2 Pretrained (70B)

$0.00799

$1.95 $23.50

*Custom model storage = $1.95

Provisioned Throughput pricing

Meta models Price per hour per model unit for 1-month commitment Price per hour per model unit for 6-month commitment

Llama 2 Pretrained and Chat (13B)

$21.18

$13.08

Llama 2 Pretrained (70B)

$21.18

$13.08

*Llama 2 Pre-trained models are available only in provisioned throughput after customization. 

Please reach out to your AWS account or sales team for more details on model units.

Mistral AI

Stability AI

On-Demand pricing

Stability AI Model Price per generated image
Stable Image Core $0.04
SD3 Large $0.08
Stable Image Ultra $0.14

Previous generation of image models offered by Stability AI are priced per image, depending on step count and image resolution.

Stability AI model Image resolution Price per image generated for standard quality (<=50 steps) Price per image generated for premium quality (>50 steps)
SDXL 1.0 Up to 1024 x 1024 $0.04 $0.08

Provisioned Throughput pricing

Stability AI model Price per hour per model unit for 1-month commitment* Price per hour per model unit for 6-month commitment*

SDXL 1.0

$49.86

$46.18

*Includes inference for base and custom models

Please reach out to your AWS account or sales team for more details on model units.

Currently, model customization (fine-tuning) is not supported for Stability AI models on Amazon Bedrock.

Amazon Bedrock Guardrails

On-Demand pricing

Guardrail policy*

Price per 1,000 text units**

Content filters

$0.75

Denied topics

$1

Contextual grounding check***

$0.1

Sensitive information filter (PII)

$0.1

Sensitive information filter (regular expression) 

Free

Word filters

Free

* Each guardrail policy is optional and can be enabled based on your application requirements. Charges will be incurred based on the policy type used in the guardrail. For example, if a guardrail is configured with content filters and denied topics, charges will be incurred for these two policies, while there will be no charges associated with sensitive information filters.

**A text unit can contain up to 1000 characters. If a text input is more than 1000 characters, it is processed as multiple text units, each containing 1000 characters or less. For example, if a text input contains 5600 characters, it will be charged for 6 text units.

*** Contextual grounding check uses a reference source and a query to determine if the model response is grounded based on the source and relevant to the query. The total number of text units charged is calculated by combining all the characters in the source, query, and model response. 

Guardrails are not supported for images and embeddings.

Pricing examples

  • An application developer makes the following API calls to Amazon Bedrock: a request to AI21’s Jurassic-2 Mid model to summarize an input of 10K tokens of input text to an output of 2K tokens.

    Total cost incurred = 10K tokens/1000 * $0.0125 + 2K tokens/1000 * $0.0125 = $0.15

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Amazon Titan Text Lite model to summarize an input of 2K tokens of input text to an output of 1K tokens.

    Total hourly cost incurred is = 2K tokens/1000 * $0.0003 + 1K tokens/1000 * $0.0004 = $0.001.

    An application developer makes the following API calls to Amazon Bedrock: a request to the Amazon Titan Image Generator base model to generate 1000 images of 1024 x 1024 in size of standard quality.

    Total cost incurred = 1000 images * $0.01 per image = $10

    Customization (fine-tuning and continued pretraining) pricing

    An application developer customizes an Amazon Titan Image Generator model using 1000 image-text pairs. After training, the developer uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1-month commitment term) to host the customized model.

    Monthly cost incurred for fine-tuning = fine-tuning training ($.005 * 500 * 64), where $0.005 is the price per image seen, 500 is the number of steps, and 64 is the batch size, + custom model storage per month ($1.95) + 1 hour of custom model inference ($21) = $160 + $1.95 + 21 = $182.95

    Provisioned Throughput pricing

    An application developer buys two model units of Amazon Titan Text Express with a 1-month commitment for their text summarization use case.

    Total monthly cost incurred = 2 model units * $18.40/hour * 24 hours * 31 days = $27,379.20

    An application developer buys one model unit of the base Amazon Titan Image Generator model with a 1-month commitment.

    Total cost incurred = 1 model unit * $16.20 * 24 hours * 31 days = $12,052.80

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock in the US West (Oregon) Region: a request to Anthropic’s Claude model to summarize an input of 11K tokens of input text to an output of 4K tokens.

    Total cost incurred = 11K tokens/1000 * $0.008 + 4K tokens/1000 * $0.024 = $0.088 + $0.096 = $0.184

    Provisioned Throughput pricing

    An application developer buys one model unit of Anthropic Claude Instant in the US West (Oregon) Region:

    Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock: a request to Cohere’s Command model to summarize an input of 6K tokens of input text to an output of 2K tokens.

    Total cost incurred = 6K tokens/1,000 * $0.0015 + 2K tokens/1,000 * $0.0020 = $0.013

    An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Command - Light model to summarize an input of 6K tokens of input text to an output of 2K tokens.

    Total cost incurred = 6K tokens/1000 * $0.0003 + 2K tokens/1000 * $0.0006 = $0.003

    An application developer makes the following API calls to Amazon Bedrock: A request to either Cohere’s Embed English or Embed Multilingual model to generate embeddings for 10K tokens of input.

    Total cost incurred = 10K tokens/1000 * $0.0001 = $.001

    Customization (fine-tuning) pricing

    An application developer customizes a Cohere Command model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.

    Monthly cost incurred for fine-tuning = Fine-tuning training ($0.004 * 1000) + custom model storage per month ($1.95) + 1 hour of custom model inference ($49.50) = $55.45

    Monthly cost incurred for provisioned throughput (1-month commitment) of custom model = $39.60

    Provisioned Throughput pricing

    An application developer, buys one model unit of Cohere Command with a 1-month commitment for their text summarization use case.

    Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock: a request to Meta’s Llama 2 Chat (13B) model to summarize an input of 2K tokens of input text to an output of 500 tokens.

    Total cost incurred = 2K tokens/1000 * $0.00075 + 500 tokens/1000 * $0.001 = $0.002

    Customization (fine-tuning) pricing

    An application developer customizes the Llama 2 Pretrained (70B) model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.

    Monthly cost incurred for fine-tuning = Fine tuning training ($0.00799 * 1000) + custom model storage per month ($1.95) + 1 hour of custom model inference ($23.50) = $33.44

    Monthly cost incurred for provisioned throughput (a 1-month commit) of custom model = $21.18

    Provisioned Throughput pricing

    An application developer buys one model unit of Meta Llama 2 with a 1-month commitment for their text summarization use case.

    Total monthly cost incurred = 1 model unit * $21.18 * 24 hours * 31 days = $15,757.92

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral 7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.

    Total hourly cost incurred = 2K tokens/1000 * $0.00015 + 1K tokens/1000 * $0.0002 = $0.0005

    An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mixtral 8x7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.

    Total hourly cost incurred = 2K tokens/1000 * $0.00045 + 1K tokens/1000 * $0.0007 = $0.0016

    An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral Large model to summarize an input of 2K tokens of input text to an output of 1K tokens. 

    Total hourly cost incurred = 2K tokens/1000 * $0.008 + 1K tokens/1000 * $0.024 = $0.04

  • On-Demand pricing

    An application developer makes the following API calls to Amazon Bedrock: a request to the SDXL model to generate a 512 x 512 image with a step size of 70 (premium quality).

    Total cost incurred = 1 image * $0.036 per image = $0.036

    An application developer makes the following API calls to Amazon Bedrock: A request to the SDXL 1.0 model to generate a 1024 x 1024 image with a step size of 70 (premium quality).

    Total cost incurred = 1 image * $0.08 per image = $0.08

    Provisioned Throughput pricing

    An application developer buys one model unit of SDXL 1.0 with a 1-month commitment.

    Total cost incurred = 1 * $49.86 * 24 hours * 31 days = $37,095.84

  • Model evaluation example 1:

    On-demand pricing
    An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.

    The dataset contains 50 prompts, and the developer requires one worker to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter).

    There will be 50 tasks in this evaluation job (one task for each prompt-response set per each worker). The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15,000 tokens for Anthropic Claude Instant and 20,000 tokens for Anthropic Claude 2.1.

    The following charges are incurred for this model evaluation job:

    Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total
    Claude Instant Inference 5000 $0.0008 $0.004 15000 $0.0024 $0.036       $0.04
    Claude 2.1 Inference 5000 $0.008 $0.04 20000 $0.024 $0.48       $0.52
    Human Tasks             50 $0.21 $10.50 $10.50
    Total                   $11.06

    Model evaluation example 2:

    On-demand pricing
    An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.

    The dataset contains 50 prompts, and the developer requires two workers to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter). There will be 100 tasks in this evaluation job (1 task for each prompt-response set per each worker: 2 workers x 50 prompt-response sets = 100 human tasks).

    The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15000 tokens for Anthropic Claude Instant and 20000 tokens for Anthropic Claude 2.1.

    The following charges are incurred for this model evaluation job:

    Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total
    Claude Instant Inference 5000 $0.0008 $0.0040 15000 $0.0024 $0.036       $0.04
    Claude 2.1 Inference 5000 $0.008 $0.0400 20000 $0.024 $0.48       $0.52
    Human Tasks             100 $0.21 $21.00 $21.00
    Total                   $21.56
  • Example 1: Customer support chatbot
    An application developer creates a customer support chatbot and uses content filters to block harmful content and denied topics to filter undesirable queries and responses.

    The chatbot serves 1000 user queries per hour. Each user query has an average input length of 200 characters and receives a FM response of 1500 characters.

    Each user query of 200 characters correspond to 1 text unit.

    Each FM response of 1,500 characters correspond to 2 text units.

    Text units processed each hour = (1 + 2) * 1000 queries = 3000 text units

    Total cost incurred per hour for content filters and denied topic = 3000 * ($0.75 + $1.00) / 1000 = $5.25

     

    Example 2: Call center transcript summarization
    An application developer creates an application to summarize chat transcripts between users and support agents. It uses sensitive information filter to redact personally identifiable information (PII) in the generated summaries for 10,000 conversations.

    Each generated summary has an average of 3,500 characters that corresponds to 4 text units.

    Total cost incurred to summarize 10,000 conversations = 10000 * 4 * ($0.1/1000) = $4

    Item Number of input tokens Price per 1000 input tokens Cost of input Number of output tokens Price per 1000 output tokens Cost of output Number of human tasks Price per human task Cost of human tasks Total
    Claude Instant Inference 5000 $0.0008 $0.004 15000 $0.0024 $0.036       $0.04
    Claude 2.1 Inference 5000 $0.008 $0.04 20000 $0.024 $0.48       $0.52
    Human Tasks             100 $0.21 $21.00 $21.00
    Total                   $21.56