Amazon SageMaker shadow testing
Validate the performance of new ML models against production models to prevent costly outages
Spot potential configuration errors before they impact end users by comparing new ML models against production models.
Improve inference performance by evaluating model changes, container updates, and new instances with production traffic.
Cut down on weeks of building a testing infrastructure and release models to production faster.
How it works
SageMaker helps you run shadow tests to evaluate a new machine learning (ML) model before production release by testing its performance against the currently deployed model. Shadow testing can help you catch potential configuration errors and performance issues before they impact end users.

Key features
Fully managed testing
With SageMaker shadow testing, you don’t need to invest in building your own testing infrastructure, so you can focus on model development. Just select the production model that you want to test against, and SageMaker automatically deploys the new model in a test environment. It then routes a copy of the inference requests received by the production model to the new model in real time and collects performance metrics such as latency and throughput.
Live performance comparison dashboards
SageMaker creates a live dashboard that shows performance metrics such as latency and error rate of the new model and the production model in a side-by-side comparison. Once you have reviewed the test results and validated the model, you can promote it to production.
Fine-grain traffic control
When running shadow tests in SageMaker, you can configure the percentage of inference requests sent to the test models. This control over the input traffic allows you to start small and increase testing only after you gain confidence in model performance.
Customers

"Amazon SageMaker’s new testing capabilities allowed us to more rigorously and proactively test ML models in production and avoid adverse customer impact and any potential outages because of an error in deployed models. This is critical, since our customers rely on us to provide timely insights based on real-time location data that changes every minute.”
Giovanni Lanfranchi, Chief Product and Technology Officer, HERE Technologies