Amazon Bedrock introduces Priority and Flex inference service tiers

Posted on: Nov 18, 2025

Today, Amazon Bedrock introduces two new inference service tiers to optimize costs and performance for different AI workloads. The new Flex tier offers cost-effective pricing for non-time-critical applications like model evaluations and content summarization while the Priority tier provides premium performance and preferential processing for mission-critical applications. For most models that support Priority Tier, customers can realize up to 25% better output tokens per second (OTPS) latency compared to standard tier. These join the existing Standard tier for everyday AI applications with reliable performance.

These service tiers address key challenges that organizations face when deploying AI at scale. The Flex tier is designed for non-interactive workloads that can tolerate longer latencies, making it ideal for model evaluations, content summarization, labeling and annotation, and multistep agentic workflow, and it’s priced at a discount relative to the Standard tier. During periods of high demand, Flex requests receive lower priority relative to the Standard tier. The Priority tier is an ideal fit for mission critical applications, real-time end-user interactions, and interactive experiences where consistent, fast responses are essential. During periods of high demand, Priority requests receive processing priority, at a premium price, over other service tiers. These new service tiers are available today for a range of leading foundation models, including OpenAI (gpt-oss-20b, gpt-oss-120b), DeepSeek (DeepSeek V3.1), Qwen3 (Coder-480B-A35B-Instruct, Coder-30B-A3B-Instruct, 32B dense, Qwen3-235B-A22B-2507), and Amazon Nova (Nova Pro and Nova Premier). With these new options, Amazon Bedrock helps customers gain greater control over balancing cost efficiency with performance requirements, enabling them to scale AI workloads economically while ensuring optimal user experiences for their most critical applications.

For more information about the AWS Regions where Amazon Bedrock Priority and Flex inference service tiers are available, see the AWS Regions table

Learn more about service tiers in our News Blog and documentation.