Posted On: Nov 10, 2021

Amazon SageMaker Inference now supports new model deployment options to update your machine learning models in production. Using the new deployment guardrails, you can easily switch from the current model in production to a new one in a controlled way. This launch introduces canary and linear traffic shifting modes so that you can have granular control over the shifting of traffic from your current model to the new one during the course of the update. With built-in safeguards such as auto-rollbacks, you can catch issues early and automatically take corrective action before they cause significant production impact.

Amazon SageMaker is a fully managed service that helps developers and data scientists to prepare, build, train, and deploy high-quality machine learning models quickly by bringing together a broad set of capabilities purpose-built for ML. When you deploy your trained ML models to Amazon SageMaker, it takes care of provisioning, patching, and updating the endpoints so that you can focus on powering your applications with ML. When you need to update your endpoint with a newer version of your ML model or serving container, SageMaker brings up a new fleet (green fleet) containing the updates and shifts traffic from the existing fleet (blue fleet) in one shot, referred to as a blue/green deployment. This makes sure that the endpoint is able to respond to requests even when the update is in progress, maximizing availability.

With this launch, Amazon SageMaker adds canary and linear traffic shifting modes to blue/green deployments. These modes provide you more granular control in shifting traffic between the fleets so that you can build confidence before dialing up traffic. Additionally, you can pre-specify CloudWatch alarms on metrics such as latency or error rates and automatically rollback the deployment to the blue fleet if any of these alarms are tripped. Canary mode allows you to shift a small percentage of traffic to the green fleet (called a canary fleet), observe the behavior of the canary fleet for a period of time (known as the baking period), and shift the remainder of the traffic only when no alarms are triggered during the baking period. Linear mode allows you shift traffic to the green fleet in configurable fixed increments (say 10%), and observe the behavior for a baking period before shifting the subsequent increment. With all the blue/green deployments, you can observe the fleets after all traffic has been shifted (known as the final baking period) before terminating the blue fleet. These traffic shifting modes help you balance the trade-off between managing the risk of introducing new models into production and controlling the duration of the update, so you can pick the right option for your use case. All at once traffic shifting minimizes the duration of the update and linear mode minimizes the risk of introducing a new model into production by shifting traffic in multiple steps. Canary mode shifts all the traffic in two steps, providing a balance between risk and update duration.

For detailed information on these new capabilities, please read our documentation, which also contains sample notebooks to help you get started. These new phased deployment capabilities are available for all newly created endpoints in all commercial regions where Amazon SageMaker is available. For a list of features that are not supported, please refer to the exclusions section of our documentation.