Amazon Forecast now supports accuracy measurements for individual items

Posted on: Nov 27, 2020

We’re excited to announce that you can now measure the accuracy of forecasts for individual items in Amazon Forecast, allowing you to better understand your forecasting model's performance for the items that most impact your business. Improving forecast accuracy for specific items—such as those with higher prices or higher costs—is often more important than optimizing for all items. With this launch, you can now view accuracy for individual items and export forecasts generated during training. This information allows you to better interpret results by easily comparing performance against observed historical demand, aggregating accuracy metrics across custom sets of SKUs or time periods, or visualizing results without needing to hold out a separate validation dataset. From there, you can tailor your experiments to further optimize accuracy for items significant for your needs. 

For retailers specifically, not all SKUs are treated equally. Usually 80% of revenue is driven by 20% of SKUs, and retailers look to optimize forecasting accuracy for those top 20% SKUs. Although you can create a separate forecasting model for the top 20% SKUs, the model’s ability to learn from relevant items outside of the top 20% is limited and accuracy may suffer. Further, retailers may prefer to over-stock certain SKUs or under-stock others, and would prefer to train one model but assess forecasting accuracy for different SKUs at different stocking levels. 

To evaluate forecasting accuracy at an item level or department level, you usually hold a validation dataset outside of Forecast and feed your training dataset to Forecast to create an optimized model. After the model is trained, you can generate multiple forecasts and compare those to the validation dataset, incurring costs during this experimentation phase, and reducing the amount of data that Forecast has to learn from. 

With today’s launch, you can now access the forecasted values from Forecast’s internal testing of splitting the data into training and backtest data groups to compare forecasts versus observed data and item-level accuracy metrics. This eliminates the need to maintain a holdout test dataset outside of Forecast. During the step of training a model, Forecast automatically splits the historical demand datasets into a training and backtesting dataset group. Forecast trains a model on the training dataset and forecasts at different specified stocking levels for the backtesting period, comparing to the observed values in the backtesting dataset group. Now, you can export the forecasts from the backtesting for each item and the accuracy metrics for each item. 

To evaluate the strength of your forecasting model for specific items or a custom set of items based on category, you can calculate the accuracy metrics by aggregating the backtest forecast results for those items. If you have selected different stocking levels, then you can choose to assess the accuracy of items at certain stocking levels, while also measuring accuracy of other items at different stocking levels. Lastly, now you can easily visualize the forecasts compared to your historical demand by exporting the backtest forecasts to Amazon QuickSight or any other visualization tool of your preference. 

To get started with this capability, read our blog to understand how to export the backtest results for each item and see the CreatePredictorBacktestExportJob API. We also have a notebook in our GitHub repo that walks you through how to use the Forecast APIs to export accuracy measurements of each item and calculate accuracy metrics for a custom set of items.

You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see Region Table