Explain how input features contributed to your model predictions in real time
Detect potential bias during data preparation, after model training, and in your deployed model
Identify any shifts in bias and feature importance after deployment
Amazon SageMaker Clarify provides machine learning (ML) developers with purpose built tools to gain greater insights into their ML training data and models. Amazon Clarify detects and measures potential bias using a variety of metrics so that they can address potential bias, and explain model predictions.
Amazon SageMaker Clarify can detect potential bias during data preparation, after model training, and in your deployed model. For instance, you can check for bias related to age in your dataset or in your trained model and receive a detailed report that quantifies different types of potential bias. SageMaker Clarify also includes feature importance scores that help you explain how your model make predictions and produces explainability reports in bulk or real time via online explainability. You can use these reports to support customer or internal presentations, or to identify potential issues with your model.
Detect bias in your data and model predictions
Identify imbalances in data
SageMaker Clarify enables you to identify potential bias during data preparation without having to write your own code as part of SageMaker Data Wrangler. You specify input features, such as gender or age, and SageMaker Clarify runs an analysis job to detect potential bias in those features. SageMaker Clarify then provides a visual report with a description of the metrics and measurements of potential bias so that you can identify steps to remediate the bias. For example, in a financial dataset that contains only a few examples of business loans to one age group as compared to others, the bias metrics will indicate the imbalance so that you can address the imbalances in dataset and potentially reduce the risk of having a model that is disproportionately inaccurate for a specific age group
In case of imbalances, you can use SageMaker Data Wrangler to balance your data. SageMaker Data Wrangler offers three balancing operators: random undersampling, random oversampling, and SMOTE to rebalance data in your unbalanced datasets. Read our blog post here to learn more.
Check your trained model for bias
After you’ve trained your model, you can run a SageMaker Clarify bias analysis via SageMaker Experiments to check your model for potential bias such as predictions that produce a negative result more frequently for one group than they do for another. You specifcy input features with respect to which you would like to measure bias in the model outcomes, such as age, and SageMaker runs an analysis and provides you with a visual report that identifies the different types of bias for each feature, such as whether older groups receive more positive predictions compared to younger groups.
AWS open source library Fair Bayesian Optimization can help mitigate bias by tuning a model’s hyperparameters. Read blog post here to learn how to apply Fair Bayesian Optimization to mitigate bias while optimizing the accuracy of a machine learning model.
Monitor your model for bias
Amazon SageMaker Clarify helps data scientists and ML engineers monitor predictions for bias on a regular basis. Bias can be introduced or exacerbated in deployed ML models when the training data differs from the data that the model sees during deployment (that is, the live data). For example, the outputs of a model for predicting home prices can become biased if the mortgage rates used to train the model differ from current mortgage rates. SageMaker Clarify bias detection capabilities are integrated into SageMaker Model Monitor, so that when SageMaker detects bias beyond a certain threshold, it automatically generates metrics that you can view in SageMaker Studio and through Amazon CloudWatch metrics and alarms.
Explain model predictions
Understand which features contributed the most to model prediction
SageMaker Clarify is integrated with SageMaker Experiments to provide scores detailing which features contributed the most to your model prediction on a particular input for tabular, NLP and computer vision models. For tabular datasets, SageMaker Clarify can also output an aggregated feature importance chart which provides insights into the overall prediction process of the model. These details can help determine if a particular model input has more influence than expected on overall model behavior. For tabular data, in addition to the feature importance scores, you can also use partial dependence plots (PDP) to show the dependence of the predicted target response on a set of input features of interest.
Explain your computer vision and NLP models
SageMaker Clarify can also provide insights into computer vision and Natural Language Processing (NLP) models. For vision models, SageMaker Clarify enables customers to see which parts of the image the model found most important. For NLP models, SageMaker Clarify provides feature importance scores at the level of words, sentences or paragraphs.
Monitor your model for changes in behavior
Changes in live data can expose a new behavior of your model. For example, a credit risk prediction model trained on the data from one geographical region could change the importance it assigns to various features when applied to the data from another region. Amazon SageMaker Clarify is integrated with SageMaker Model Monitor to notify you using alerting systems such as Amazon CloudWatch if the importance of input features shift, causing model behavior to change.
Explain individual model predictions in real-time
SageMaker Clarify can provide scores detailing which features contributed the most to your model’s individual prediction after the model has been run on new data. These details can help determine if a particular input feature has more influence on the model predictions than expected. You can view these details for each prediction in real-time via online explainability or get a report in bulk that utilize batch processing of all the individual predictions.
Data scientists and ML engineers need tools to generate the insights required to debug and improve ML models through better feature engineering, to determine whether a model is making inferences based on noisy or irrelevant features, and to understand the limitations of their models and failure modes their models may encounter.
The adoption of AI systems requires transparency. This is achieved through reliable explanations of the trained models and their predictions. Model explainability may be particularly important to certain industries with reliability, safety, and compliance requirements, such as financial services, human resources, healthcare, and automated transportation.
Companies may need to explain certain decisions and take steps around model risk management. Amazon SageMaker Clarify can help detect any potential bias present in the initial data or in the model after training and can also help explain which model features contributed the most to an ML model’s prediction.
Bundesliga Match Facts, powered by AWS, provides a more engaging fan experience during soccer matches for Bundesliga fans around the world. With Amazon SageMaker Clarify, the Bundesliga can now interactively explain what some of the key, underlying components are in determining what led the ML model to predict a certain xGoals value. Knowing respective feature attributions and explaining outcomes helps in model debugging and increasing confidence in ML algorithms, which results in higher-quality predictions.
“Amazon SageMaker Clarify seamlessly integrates with the rest of the Bundesliga Match Facts digital platform and is a key part of our long-term strategy of standardizing our ML workflows on Amazon SageMaker. By using AWS’s innovative technologies, such as machine learning, to deliver more in-depth insights and provide fans a better understanding of the split-second decisions made on the pitch, Bundesliga Match Facts enables viewers to gain deeper insights into the key decisions in each match."
Andreas Heyden, Executive Vice President of Digital Innovations for the DFL Group
“The combination of AutoGluon and Amazon SageMaker Clarify enabled our customer churn model to predict customer churn with 94% accuracy. SageMaker Clarify helps us understand the model behavior by providing explainability through SHAP values. With SageMaker Clarify, we reduced the computation cost of SHAP values by up to 50% compared to a local calculation. The joint solution gives us the ability to better understand the model and improve customer satisfaction at a higher rate of accuracy with significant cost savings."
Masahiro Takamoto, Head of Data Group, CAPCOM
“Domo offers a scalable suite of data science solutions that are easy for anyone in an organization to use and understand. With Clarify, our customers are enabled with important insights on how their AI models are making predictions. The combination of Clarify with Domo helps to increase AI speed and intelligence for our customers by putting the power of AI into the hands of everyone across their business and ecosystems.”
Ben Ainscough, Ph.D., Head of AI and Data Science, Domo
Varo Bank is a US-based digital bank, and uses AI/ML to help make rapid, risk-based decisions to deliver its innovative products and services to customers.
“Varo has a strong commitment to the explainability and transparency of our ML models and we're excited to see the results from Amazon SageMaker Clarify in advancing these efforts.”
Sachin Shetty, Head of Data Science, Varo Money
Jiahang Zhong, Head of Data Science, Zopa