What are the new service tiers for Amazon Bedrock?

Amazon Bedrock now offers two new service tiers in addition to the existing Standard tier: Priority - Premium tier offering faster response times and prioritized compute for time-sensitive applications Flex - A cost-effective option for non-time-critical workloads where requests may see increased latency

When should I use Priority?

Priority is a good fit for: Real-time customer interactions Interactive user experiences like real-time language translation Mission-critical applications requiring immediate responses

When should I use Flex?

The Flex tier is ideal for: Model evaluations Content summarization, labeling, and annotation Multi-step agentic workflows

Amazon Bedrock

Amazon Bedrock service tiers

Match the right service tier to your workload needs

Pricing

Request a quote

Service options to match your workload needs

Priority Tier

Priority requests receive preferential treatment in the processing queue, moving ahead of standard requests for faster responses even during high-demand periods. Priority tier is ideal for mission-critical workflows and customer-facing applications highly sensitive to latency. For most models that support Priority tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to Standard tier.

Pricing: Standard rate + premium

Use cases: Real-time customer service, live applications, urgent decision-making systems

Supported models: Check the documentation.

Standard Tier

The Standard tier provides consistent performance at regular rates for everyday AI tasks.

Pricing: Standard rate per token

Use cases: General AI applications, content generation, routine analysis

Supported models: Available across all foundation models in Amazon Bedrock. Learn more.

Flex Tier

Flex tier offers discounted standard pricing for workloads that can trade immediate processing for cost efficiency. Perfect for non-urgent AI workloads.

Pricing: Discounted standard model rate

Use cases: Model evaluations, summarizations, multi-step agentic workflows

Supported models: Check the documentation.

Reserved Tier

Reserved service tier designed for workloads requiring predictable performance and guaranteed tokens-per-minute capacity.

Pricing: Customers can reserve capacity for 1 month or 3 month duration. Customers pay a fixed price per 1K tokens-per-minute and are billed monthly.

Use cases: Mission critical applications with consistent tokens-per-minute throughput needs

Supported models: Check the documentation.

Overview
2
Priority
3
Flex
4
Model availability
1
Choosing the right tier
3
Getting started
3

Overview

Open all

Amazon Bedrock now offers two new service tiers in addition to the existing Standard tier:

Priority - Premium tier offering faster response times and prioritized compute for time-sensitive applications
Flex - A cost-effective option for non-time-critical workloads where requests may see increased latency

The Standard tier remains available as the default service option with reliable performance for everyday AI applications. The new service tiers provide additional inference options depending on performance and cost requirements.

Priority

Open all

The Priority tier is a premium service tier that provides faster response times and preferential processing for time-sensitive workloads. For most models that support Priority tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to Standard tier.

Priority is a good fit for:

Real-time customer interactions
Interactive user experiences like real-time language translation
Mission-critical applications requiring immediate responses

Priority tier requests receive access to more compute resources and processing priority over other tiers, providing faster performance even when the system is under heavy load.

Flex

Open all

The Flex tier is a cost-effective service tier designed for non-time-critical workloads. It can be used in cases where your applications and agentic workflows can tolerate some increase in latency.

The Flex tier is ideal for:

Model evaluations
Content summarization, labeling, and annotation
Multi-step agentic workflows

Flex may experience longer latencies than Standard tier. During high traffic, Flex tier requests are processed after Standard tier requests. It is designed for non-interactive workloads that can tolerate these longer latencies.

While both options are cost-effective for non-time-critical workloads, they serve different use cases:

Flex: Real-time single API call suitable for applications that can tolerate some increase in latency but still need synchronous processing.
Batch: Asynchronous processing of large datasets where you submit multiple prompts at once and retrieve results later from Amazon S3. Some examples of typical use cases for batch inference include: creating large volumes of marketing content, document classification, or data extraction.

Model availability

Open all

We have several models available from leading providers such as OpenAI, DeepSeek, Qwen, and Amazon.

For a complete list, check the documentation.

Choosing the right tier

Open all

Consider these factors:

Priority: Choose if you need faster response times for mission-critical applications.
Standard: Choose for everyday AI applications that need reliable performance.
Flex: Choose if cost optimization is your priority and your application can tolerate increased latency.

Yes, you can choose different service tiers based on your specific use case and requirements for each API call or application.

Yes, each service tier has a different pricing structure. Flex tier offers cost savings over standard pricing for flexible workloads, while Priority tier commands a premium for faster performance and prioritized compute. See the Amazon Bedrock pricing page for specific prices by model.

Getting started

Open all

The new service tiers are available through the Amazon Bedrock console and API. You can use the same inference API as the default Standard tier by just adding an extra parameter to select service tier at invoke time. Review the documentation for more details.

Existing applications will continue to work with the Standard tier. To take advantage of the new service tiers, you'll need to update your API calls to specify the desired service tier for supported models.

For detailed pricing information and the latest updates on model availability, refer to the Amazon Bedrock pricing page or contact your AWS account team.

Estimate your monthly Amazon Bedrock costs

Use the AWS Pricing Calculator

Get started with Amazon Bedrock

Blog

Get started with Amazon Bedrock

Learn more

Workshop

Explore common Amazon Bedrock use cases with a guided workshop

View workshop

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages

Amazon Bedrock service tiers