When you create a solution version, Amazon Personalize generates metrics that you can use to evaluate the performance of the model before you create a campaign and provide recommendations. Metrics allow you to view the effects of modifying a solution's hyperparameters. You can also use metrics to compare the results between solutions that use the same training data but were created with different recipes.
Amazon Personalize provides the following metrics. For each metric, higher numbers are better than lower numbers.
- coverage - the proportion of unique recommended items from all queries out of the total number of unique items in the interactions and items datasets.
- mean_reciprocal_rank_at_K - the mean of the reciprocal ranks of the first relevant recommendation out of the top K recommendations over all queries. This metric is appropriate if you're interested in the single highest ranked recommendation.
- normalized_discounted_cumulative_gain_at_K - discounted gain assumes that recommendations lower on a list of recommendations are less relevant than higher recommendations. Therefore, each recommendation is discounted (given a lower weight) by a factor dependent on its position. To produce the cumulative discounted gain (DCG) at K, each relevant discounted recommendation in the top K recommendations is summed together. The normalized discounted cumulative gain (NDCG) is the DCG divided by the ideal DCG such that NDCG is between 0 - 1. (The ideal DCG is where the top K recommendations are sorted by relevance.) Amazon Personalize uses a weighting factor of 1/log(1 + position), where the top of the list is position 1. This metric rewards relevant items that appear near the top of the list, because the top of a list usually draws more attention.
- precision_at_K - The number of relevant recommendations out of the top K recommendations divided by K. This metric rewards precise recommendation of the relevant items.
For more information on these metrics, see Evaluating a Solution Version.
Run the following code to evaluate this solution version:
get_solution_metrics_response = personalize.get_solution_metrics(
solutionVersionArn = solution_version_arn
)
print(json.dumps(get_solution_metrics_response, indent=2))
Consider the following example output and take a closer look at what these metrics mean for the model:
"metrics": {
"coverage": 0.0762,
"mean_reciprocal_rank_at_25": 0.2545,
"normalized_discounted_cumulative_gain_at_10": 0.2257,
"normalized_discounted_cumulative_gain_at_25": 0.2929,
"normalized_discounted_cumulative_gain_at_5": 0.2112,
"precision_at_10": 0.0339,
"precision_at_25": 0.03,
"precision_at_5": 0.0536
}
The normalized discounted cumulative gain above indicates that at 5 items, there is a 21% chance in a recommendation being a part of a user's interaction. Around 7% of the recommended items are unique, and there is a precision of about 5.3% in the top 5 recommended items.
Keep in mind that this model uses rating data for interactions because MovieLens is an explicit dataset based on ratings. The timestamps also were from the time that the movie was rated, not watched, so the order is not the same as the order a viewer would watch movies.