Posted On: Apr 25, 2024

Foundation model evaluations with SageMaker Clarify is now generally available. This capability helps data scientists and machine learning engineers evaluate, compare, and select foundation models based on a variety of criteria across different tasks within minutes.

SageMaker customers select between hundreds of foundation models to power their generative AI applications. They evaluate and compare these models during model selection and model customization to determine the optimal fit for their use case. This process can takes days identifying relevant benchmarks, configuring evaluation tools, and conducting evaluation on each model. The results obtained are frequently challenging to apply to their specific use case.

SageMaker Clarify offers automated and human evaluations with interpretable results. Customers can use this new capability in Amazon SageMaker Studio to evaluate SageMaker-hosted LLMs or use fmeval to evaluate any LLMs. Get started by utilizing curated prompt datasets tailored for tasks like text generation, summarization, question answering, and classification. Customize inference parameters and prompt templates and compare results of different models settings. Extend evaluations with custom prompt datasets and metrics. Human evaluations enable customers to assess more subjective aspects like creativity and style. Following each evaluation, customers receive a comprehensive report, complete with visualizations and examples, and integrate them into their SageMaker ML workflows.

This capability is available in all AWS Regions, except AWS GovCloud (US) Regions, China Regions, Asia Pacific (Hyderabad), Asia Pacific (Melbourne), Canada West (Calgary), Europe (Zurich), Europe (Stockholm), Europe (Spain), Israel (Tel Aviv), Middle East (UAE).

For additional details, see our product page, documentation and pricing page.