- Generative AI›
- Amazon Bedrock›
- Service Tiers
Amazon Bedrock Service Tiers
Match the right service tier to your workload needs
Service options to match your workload needs
Priority Tier
Priority requests receive preferential treatment in the processing queue, moving ahead of standard requests for faster responses even during high-demand periods. Priority Tier is ideal for mission critical workflows and customer-facing applications highly sensitive to latency. For most models that support Priority Tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to standard tier.
Pricing: Standard rate + premium
Use Cases: Real-time customer service, live applications, urgent decision-making systems
Supported Models: Check the documentation.
Standard Tier
The Standard tier provides consistent performance at regular rates for everyday AI tasks.
Pricing: Standard rate per token
Use Cases: General AI applications, content generation, routine analysis
Supported Models: Available across all foundation models in Amazon Bedrock. Learn more
Flex Tier
Flex tier offers discounted standard pricing for workloads that can trade immediate processing for cost efficiency. Perfect for non-urgent AI workloads.
Pricing: Discounted standard model rate.
Use Cases: Model evaluations, summarizations, multi-step agentic workflows
Supported Models: Check the documentation
Overview
Open allAmazon Bedrock now offers two new service tiers in addition to the existing Standard tier:
- Priority - Premium tier offering faster response times and prioritized compute for time-sensitive applications
- Flex - A cost-effective option for non-time-critical workloads where requests may see increased latency
The Standard tier remains available as the default service option with reliable performance for everyday AI applications. The new service tiers provide additional inference options depending on performance and cost requirements.
Priority
Open allThe Priority tier is a premium service tier that provides faster response times and preferential processing for time-sensitive workloads. For most models that support Priority tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to standard tier.
Priority is a good fit for:
- Real-time customer interactions
- Interactive user experiences like real-time language translation
- Mission-critical applications requiring immediate responses
Priority tier requests receive access to more compute resources and processing priority over other tiers, providing faster performance even when the system is under heavy load.
Flex
Open allThe Flex tier is a cost-effective service tier designed for non-time-critical workloads. It can be used in cases where your applications and agentic workflows can tolerate some increase in latency.
The Flex tier is ideal for:
- Model evaluations
- Content summarization, labeling and annotation
- Multi-step agentic workflows
Flex may experience longer latencies than standard tier. During high traffic, Flex tier requests are processed after Standard tier requests. It is designed for non-interactive workloads that can tolerate these longer latencies.
While both options are cost-effective for non-time-critical workloads, they serve different use cases:
- Flex: Real-time single API call suitable for applications that can tolerate some increase in latency but still need synchronous processing.
- Batch: Asynchronous processing of large datasets where you submit multiple prompts at once and retrieve results later from Amazon S3. Some examples of typical use cases for batch inference include: create large volumes of marketing content, document classification, or data extraction.
Model Availability
Open allWe have several models available from leading providers like OpenAI, DeepSeek, Qwen and Amazon.
For a complete list check the documentation.
Choosing the Right Tier
Open allConsider these factors:
- Priority: Choose if you need faster response times for mission critical applications.
- Standard: Choose for everyday AI applications that need reliable performance.
- Flex: Choose if cost optimization is your priority and your application can tolerate increased latency.
Yes, you can choose different service tiers based on your specific use case and requirements for each API call or application.
Yes, each service tier has a different pricing structure. Flex tier offers cost savings over standard pricing for flexible workloads, while Priority tier commands a premium for faster performance and prioritized compute. Please see the Amazon pricing page for specific prices by model.
Getting Started
Open allThe new service tiers are available through the Amazon Bedrock console and API. You can use the same inference API as the default standard tier by just adding an extra parameter to select service tier at invoke time. Please review documentation for more details.
Existing applications will continue to work with the Standard tier. To take advantage of the new service tiers, you'll need to update your API calls to specify the desired service tier for supported models.
For detailed pricing information and the latest updates on model availability, please refer to the Amazon Bedrock pricing page or contact your AWS account team.
Estimate your monthly Amazon Bedrock costs
Use the AWS Pricing CalculatorGet started with Amazon Bedrock
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages