Amazon SageMaker Clarify

Evaluate models and explain model predictions

What is Amazon SageMaker Clarify?

Amazon SageMaker Clarify provides purpose-built tools to gain greater insights into your ML models and data, based on metrics such as accuracy, robustness, toxicity, and bias to improve model quality and support responsible AI initiative. With the rise of generative AI, data scientists and ML engineers can leverage publicly available foundation models (FMs) to accelerate speed-to-market. To remove the heavy lifting of evaluating and selecting the right FM for your use case, Amazon SageMaker Clarify supports FM evaluation to help you quickly evaluate, compare, and select the best FM for your use case based on a variety of criteria across different tasks within minutes. It allows you to adopt FMs faster and with confidence. For tabular, computer vision, and timeseries models, SageMaker Clarify provides model explainability during model development or post model deployment. You can use the bias and explainability reports to identify potential issues, and therefore direct efforts to improve accuracy, remove bias, and increase performance.

Benefits of SageMaker Clarify

Automatically evaluate FMs for your generative AI use case with metrics such as accuracy, robustness, and toxicity to support your responsible AI initiative. For criteria or nuanced content that requires sophisticated human judgment, you can choose to leverage your own workforce or use a managed workforce provided by AWS to review model responses.
Explain how input features contribute to your model predictions during model development and inference. Evaluate your FM during customization using the automatic and human-based evaluations.
Generate easy to understand metrics, reports, and examples throughout the FM customization and MLOps workflow.
Detect potential bias and other risks, as prescribed by guidelines such as ISO 42001, during data preparation, model customization, and in your deployed models.

Evaluate foundation models (preview)

  • Evaluation wizard and reports
  • Evaluation wizard and reports

    Evaluation wizard and reports

    To launch an evaluation, select the model, task, and evaluation type — human-based or automatic reporting. Leverage evaluation results to select the best model for your use case, and to quantify the impact of your model customization techniques, such as prompt engineering, reinforcement learning from human feedback (RLHF), retrieval-augmented generation (RAG), and supervised fined tuning (SFT). Evaluation reports summarize scores across multiple dimensions, allowing quick comparisons and decisions. More detailed reports provide examples of the highest and the lowest scoring model outputs, allowing you to focus on where to optimize further.

  • Customization