Amazon SageMaker AI now supports serverless reinforcement fine-tuning for 12 additional models

Posted on: Mar 25, 2026

Amazon SageMaker AI now supports serverless model customization and reinforcement fine-tuning for 12 additional open-weight models, enabling you to fine-tune and evaluate them without provisioning or managing infrastructure. The newly supported models are: gpt-oss-120b, Qwen2.5 72B Instruct, DeepSeek-R1-Distill-Llama-70B, Qwen3 14B, DeepSeek-R1-Distill-Qwen-14B, Qwen2.5 14B Instruct, DeepSeek-R1-Distill-Llama-8B, DeepSeek-R1-Distill-Qwen-7B, Qwen3 4B, Meta Llama 3.2 3B Instruct, Qwen3 1.7B, and DeepSeek-R1-Distill-Qwen-1.5B. With this expansion, you can customize these models using supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement fine-tuning (RFT) techniques including RLVR and RLAIF, and only pay for what you use.

Reinforcement fine-tuning enables you to align models to complex, domain-specific reasoning tasks where techniques such as traditional SFT alone fall short. With RLVR, you can improve model accuracy on verifiable tasks such as code generation, math, and structured extraction by providing reward signals based on correctness. RLAIF uses AI-generated feedback to steer model behavior toward your quality and safety preferences. These techniques are available on previously supported and newly added models, with no cluster setup, capacity planning, or distributed training expertise required.

These models and fine-tuning techniques are available in US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), and EU (Ireland). To get started, see the Amazon SageMaker AI model customization product page and visit the Amazon SageMaker AI pricing page (Model Customization tab) to see the full list of models, techniques, and prices.