AWS for Industries
Demand intelligence made simple with PredictHQ event data through AWS Data Exchange and Amazon SageMaker
No matter your business, events likely have a major impact on your demand. Using various solutions from Amazon Web Services (AWS), companies can gain access to high-quality and enriched event data that helps them analyze what’s happening in the real world at a massive scale. This in turn can help them make decisions around staffing, inventory, pricing, site selection, on-time delivery, and more.
Event data can be consumed in a variety of ways, including through PredictHQ’s various APIs, as well as through AWS Data Exchange, where you can find, subscribe to, and use third-party data in the cloud. This synergy between PredictHQ and AWS helps companies access intelligent event data instantaneously so that the data is always up to date, which is crucial given the dynamic nature of events.
That being said, third-party data can be complex to bring into your data warehouse or existing models. The goal of this blog is to show you, through an example, how to retrieve and integrate event features into a forecasting model running on Amazon SageMaker, an AWS cloud machine learning (ML) platform.
Overview
We’ll use real-world demand data from a restaurant customer to show you how integrating event features into an existing forecasting model can improve forecasting accuracy (root mean square error [RMSE]) by up to 20 percent or even beyond. Improved demand forecasts for restaurants have downstream impact on labor optimization, ordering, and more.
In addition to Amazon SageMaker, we’ll use an Extreme Gradient Boosting (XGBoost) model, a supervised learning algorithm used for regression and classification on large datasets.
Although we’re using Amazon SageMaker and XGBoost here, the model architecture (depicted below) used in this demo is agnostic to ML platforms and forecasting models.
Set up
Please follow this guideline to launch Amazon SageMaker Studio from the console.
Once the Amazon SageMaker studio is ready, you can clone the GitHub repository containing the Jupyter Notebook used in this blog.
After cloning the repository, open notebook rundemo_rd_sdk.ipynb to get started.
Get started
After you are ready to start, the first step is to install all required Python packages to run this demo.
In this example, we will predict the order count for a fast-casual restaurant located in Iowa City, Iowa. We have one year of historical data, dated back from 2021-06-01 to 2022-07-04. We will use 2021-06-01 to 2022-06-19 as training data, and we will predict the demand for the dates from 2022-06-20 to 2022-07-04.
We’ll use a radius of 1.76 km (1.1 miles) to search for the events near this store. This radius is a result of our suggested radius API. There are 23 venues close to this store within a 1.76 km radius.
We’ll use PredictHQ Beam to determine the event categories to focus on. Beam is the PredictHQ automated correlation engine that consists of two different models: the decomposition model and a category importance model. Beam decomposes the demand data and more to identify the statistical correlation between event categories and the demand data we’re working with, which, in this case, is the number of orders.
By running these models, we learn there are eight event categories statistically correlated to the demand:
- Sports
- Public holidays
- School holidays
- Expos
- Observances
- Severe weather
- Concerts
- Performing arts
The PredictHQ data science team can help you run your demand data through the category importance model and get access to the suggested radius API as well as decompose your data.
Get relevant event features through the Features API and process
After we’ve determined our focus event categories, we will find relevant features to use through PredictHQ Features API, which are forecast-ready prebuilt intelligence and features.
The ACCESS_TOKEN is used for preparing event features from Features API. The provided ACCESS_TOKEN is limited to the demo example. For event features in other locations or time periods, the following link will guide you through creating an account and creating an access token:
First, you’ll prepare the features for attendance-based events and holiday events.
Next, we’ll prepare features for school holidays.
Finally, you’ll prepare features for severe weather events. Severe weather warnings and alerts might lead to disruption and can have a huge influence on demand.
Capture severe weather events with Forecast-Ready Demand Impact Patterns and Polygons
These events impact demand before and after they occur. PredictHQ Forecast-Ready Demand Impact Patterns accurately capture the leading, lagging, and coincident effects of a severe weather event on demand.
Of course, severe weather events don’t happen at only a single location. PredictHQ Polygons help you see the full area impacted by an event represented as a shape—giving you a much more accurate picture of impact. Polygons automatically update as severe weather events change direction, severity, and area of impact. Polygons are driven by the most up-to-date, accurate weather data available—so restaurants can quickly take action.
By using Features API, you can easily get access to severe weather event features for your forecasts. Given this is one of the categories that was correlated to the number of orders for the restaurant we’re working on, we’ll use these features.
Now that you have the features you’ll be working with, the next step is to load the demand through a comma-separated values (CSV) file and combine event features with time trend features.
Build a forecasting model with XGBoost
Now you are ready to build a forecast using the XGBoost model based on all the features.
Forecast the next two weeks’ demand starting from 2022-06-20.
Compare forecasts with and without event features
Here we have done a model comparison based on mean absolute error (MAE) and RMSE. The results are as follows:
- Without event features in the model, MAE is 11.23.
- With event features in the model, MAE is 9.20 (that is, it improved by 18.13 percent).
- Without event features in the model, RMSE is 14.04.
- With event features in the model, RMSE is 10.32 (that is, it improved by 26.49 percent).
Summary
Factoring events into your demand forecasting improves accuracy and profitability. In this specific demo, you can see that a restaurant customer was able to improve forecasting accuracy by more than 20 percent RMSE by integrating event features into their model. We have seen this type of RMSE improvement generate $50,000 to $100,000 in labor savings per restaurant each year, resulting in millions in savings across an entire network.
PredictHQ’s intelligent event data, which is available on AWS Data Exchange, helps your models or teams be prepared for upcoming fluctuations. Coupling that with Amazon SageMaker and its wide range of supported ML features and models, you can achieve your data-driven business goals quickly and easily.
To learn more about PredictHQ data and what’s possible, check out data offerings through AWS Data Exchange or reach out directly here.