Predicting new and existing product sales in semiconductors using Amazon Forecast

This is a joint post by NXP SEMICONDUCTORS N.V. & AWS Machine Learning Solutions Lab (MLSL)

Machine learning (ML) is being used across a wide range of industries to extract actionable insights from data to streamline processes and improve revenue generation. In this post, we demonstrate how NXP, an industry leader in the semiconductor sector, collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use ML techniques to optimize the allocation of the NXP research and development (R&D) budget to maximize their long-term return on investment (ROI).

NXP directs its R&D efforts largely to the development of new semiconductor solutions where they see significant opportunities for growth. To outpace market growth, NXP invests in research and development to extend or create leading market positions, with an emphasis on fast-growing, sizable market segments. For this engagement, they sought to generate monthly sales forecasts for new and existing products across different material groups and business lines. In this post, we demonstrate how the MLSL and NXP employed Amazon Forecast and other custom models for long-term sales predictions for various NXP products.

“We engaged with the team of scientists and experts at [the] Amazon Machine Learning Solutions Lab to build a solution for predicting new product sales and understand if and which additional features could help inform [the] decision-making process for optimizing R&D spending. Within just a few weeks, the team delivered multiple solutions and analyses across some of our business lines, material groups, and on [an] individual product level. MLSL delivered a sales forecast model, which complements our current way of manual forecasting, and helped us model the product lifecycle with novel machine learning approaches using Amazon Forecast and Amazon SageMaker. While keeping a constant collaborative workstream with our team, MLSL helped us with upskilling our professionals when it comes to scientific excellence and best practices on ML development using AWS infrastructure.”

– Bart Zeeman, Strategist and Analyst at CTO office in NXP Semiconductors.

Goals and use case

The goal of the engagement between NXP and the MLSL team is to predict the overall sales of NXP in various end markets. In general, the NXP team is interested in macro-level sales that include the sales of various business lines (BLs), which contain multiple material groups (MAGs). Furthermore, the NXP team is also interested in predicting the product lifecycle of newly introduced products. The lifecycle of a product is divided into four different phases (Introduction, Growth, Maturity, and Decline). The product lifecycle prediction enables the NXP team to identify the revenue generated by each product to further allocate R&D funding to the products generating the highest amounts of sales or products with the highest potential to maximize the ROI for R&D activity. Additionally, they can predict the long-term sales on a micro level, which gives them a bottom-up look on how their revenue changes over time.

In the following sections, we present the key challenges associated with developing robust and efficient models for long-term sales forecasts. We further describe the intuition behind various modeling techniques employed to achieve the desired accuracy. We then present the evaluation of our final models, where we compare the performance of the proposed models in terms of sales prediction with the market experts at NXP. We also demonstrate the performance of our state-of-the-art point cloud-based product lifecycle prediction algorithm.

Challenges

One of the challenges we faced while using fine-grained or micro-level modeling like product-level models for sale prediction was missing sales data. The missing data is the result of lack of sales during every month. Similarly, for macro-level sales prediction, the length of the historical sales data was limited. Both the missing sales data and the limited length of historical sales data pose significant challenges in terms of model accuracy for long-term sales prediction into 2026. We observed during the exploratory data analysis (EDA) that as we move from micro-level sales (product level) to macro-level sales (BL level), missing values become less significant. However, the maximum length of historical sales data (maximum length of 140 months) still posed significant challenges in terms of model accuracy.

Modeling techniques

After EDA, we focused on forecasting at the BL and MAG levels and at the product level for one of the largest end markets (the automobile end market) for NXP. However, the solutions we developed can be extended to other end markets. Modeling at the BL, MAG, or product level has its own pros and cons in terms of model performance and data availability. The following table summarizes such pros and cons for each level. For macro-level sales prediction, we employed the Amazon Forecast AutoPredictor for our final solution. Similarly, for micro-level sales prediction, we developed a novel point cloud-based approach.

Macro sales prediction (top-down)

To predict the long terms sales values (2026) at the macro level, we tested various methods, including Amazon Forecast, GluonTS, and N-BEATS (implemented in GluonTS and PyTorch). Overall, Forecast outperformed all other methods based on a backtesting approach (described in the Evaluation Metrics section later in this post) for macro-level sales prediction. We also compared the accuracy of AutoPredictor against human predictions.

We also proposed using N-BEATS due to its interpretative properties. N-BEATS is based on a very simple but powerful architecture that uses an ensemble of feedforward networks that employ the residual connections with stacked residual blocks for forecasting. This architecture further encodes the inductive bias in its architecture to make the time series model capable of extracting trend and seasonality (see the following figure). These interpretations were generated using PyTorch Forecasting.

Micro sales prediction (bottom-up)

In this section, we discuss a novel method developed to predict the product lifecycle shown in the following figure while taking into consideration the cold start product. We implemented this method using PyTorch on Amazon SageMaker Studio. First, we introduced a point cloud-based method. This method first converts sales data into a point cloud, where each point represents sales data at a certain age of the product. The point cloud-based neural network model is further trained using this data to learn the parameters of the product lifecycle curve (see the following figure). In this approach, we also incorporated additional features, including product description as a bag of words to tackle the cold start problem for predicting the product lifecycle curve.

Time series as point cloud-based product lifecycle prediction

We developed a novel point cloud-based approach to predict the product lifecycle and micro-level sales predictions. We also incorporated additional features to further improve the model accuracy for the cold start product lifecycle predictions. These features include product fabrication techniques and other related categorical information related to the products. Such additional data can help the model predict sales of a new product even before the product is released on the market (cold start). The following figure demonstrates the point cloud-based approach. The model takes the normalized sales and age of the product (number of months since the product is launched) as input. Based on these inputs, the model learns parameters during the training using gradient descent. During the forecast phase, the parameters along with the features of a cold start product are used for predicting the lifecycle. The large number of missing values in the data at the product level negatively impacts nearly all of the existing time series models. This novel solution is based on the ideas of lifecycle modeling and treating time series data as point clouds to mitigate the missing values.

The following figure demonstrates how our point cloud-based lifecycle method addresses the missing data values and is capable of predicting the product lifecycle with very few training samples. The X-axis represents the age in time, and the Y-axis represents the sales of a product. Orange dots represent the training samples, green dots represent the testing samples, and the blue line demonstrates the predicted lifecycle of a product by the model.

Methodology

To predict macro-level sales, we employed Amazon Forecast among other techniques. Similarly, for micro sales, we developed a state-of-the-art point cloud-based custom model. Forecast outperformed all other methods in terms of model performance. We used Amazon SageMaker notebook instances to create a data processing pipeline that extracted training examples from Amazon Simple Storage Service (Amazon S3). The training data was further used as input for Forecast to train a model and predict long-term sales.

Training a time series model using Amazon Forecast consists of three main steps. In the first step, we imported the historical data into Amazon S3. Second, a predictor was trained using the historical data. Finally, we deployed the trained predictor to generate the forecast. In this section, we provide a detailed explanation along with code snippets of each step.

We started by extracting the latest sales data. This step included uploading the dataset to Amazon S3 in the correct format. Amazon Forecast takes three columns as inputs: timestamp, item_id, and target_value (sales data). The timestamp column contains the time of sales, which could be formatted as hourly, daily, and so on. The item_id column contains the name of the sold items, and the target_value column contains sales values. Next, we used the path of training data located in Amazon S3, defined the time series dataset frequency (H, D, W, M, Y), defined a dataset name, and identified the attributes of the dataset (mapped the respective columns in the dataset and their data types). Next, we called the create_dataset function from the Boto3 API to create a dataset with attributes such as Domain, DatasetType, DatasetName, DatasetFrequency, and Schema. This function returned a JSON object that contained the Amazon Resource Name (ARN). This ARN was subsequently used in the following steps. See the following code:

dataset_path = "PATH_OF_DATASET_IN_S3"
DATASET_FREQUENCY = "M" # Frequency of dataset (H, D, W, M, Y) 
TS_DATASET_NAME = "NAME_OF_THE_DATASET"
TS_SCHEMA = {
   "Attributes":[
      {
         "AttributeName":"item_id",
         "AttributeType":"string"
      },
       {
         "AttributeName":"timestamp",
         "AttributeType":"timestamp"
      },
      {
         "AttributeName":"target_value",
         "AttributeType":"float"
      }
   ]
}

create_dataset_response = forecast.create_dataset(Domain="CUSTOM",
                                                  DatasetType='TARGET_TIME_SERIES',
                                                  DatasetName=TS_DATASET_NAME,
                                                  DataFrequency=DATASET_FREQUENCY,
                                                  Schema=TS_SCHEMA)

ts_dataset_arn = create_dataset_response['DatasetArn']

After the dataset was created, it was imported into Amazon Forecast using the Boto3 create_dataset_import_job function. The create_dataset_import_job function takes the job name (a string value), the ARN of the dataset from the previous step, the location of the training data in Amazon S3 from the previous step, and the time stamp format as arguments. It returns a JSON object containing the import job ARN. See the following code:

TIMESTAMP_FORMAT = "yyyy-MM-dd"
TS_IMPORT_JOB_NAME = "SALES_DATA_IMPORT_JOB_NAME"

ts_dataset_import_job_response = \
    forecast.create_dataset_import_job(DatasetImportJobName=TS_IMPORT_JOB_NAME,
                                       DatasetArn=ts_dataset_arn,
                                       DataSource= {
                                         "S3Config" : {
                                             "Path": ts_s3_path,
                                             "RoleArn": role_arn
                                         } 
                                       },
                                       TimestampFormat=TIMESTAMP_FORMAT,
                                       TimeZone = TIMEZONE)

ts_dataset_import_job_arn = ts_dataset_import_job_response['DatasetImportJobArn']

The imported dataset was then used to create a dataset group using the create_dataset_group function. This function takes the domain (string values defining the domain of the forecast), dataset group name, and the dataset ARN as inputs:

DATASET_GROUP_NAME = "SALES_DATA_GROUP_NAME"
DATASET_ARNS = [ts_dataset_arn]

create_dataset_group_response = \
    forecast.create_dataset_group(Domain="CUSTOM",
                                  DatasetGroupName=DATASET_GROUP_NAME,
                                  DatasetArns=DATASET_ARNS)

dataset_group_arn = create_dataset_group_response['DatasetGroupArn']

Next, we used the dataset group to train forecasting models. Amazon Forecast offers various state-of-the-art models; any of these models can be used for training. We used AutoPredictor as our default model. The main advantage of using AutoPredictor is that it automatically generates the item-level forecast, using the optimal model from an ensemble of six state-of-the-art models based on the input dataset. The Boto3 API provides the create_auto_predictor function for training an auto prediction model. The input parameters of this function are PredictorName, ForecastHorizon, and ForecastFrequency. Users are also responsible for selecting the forecast horizon and frequency. The forecast horizon represents the window size of the future prediction, which can be formatted hours, days, weeks, months, and so on. Similarly, forecast frequency represents the granularity of the forecast values, such as hourly, daily, weekly, monthly, or yearly. We mainly focused on predicting monthly sales of NXP on various BLs. See the following code:

PREDICTOR_NAME = "SALES_PREDICTOR"
FORECAST_HORIZON = 24
FORECAST_FREQUENCY = "M"

create_auto_predictor_response = \
    forecast.create_auto_predictor(PredictorName = PREDICTOR_NAME,
                                   ForecastHorizon = FORECAST_HORIZON,
                                   ForecastFrequency = FORECAST_FREQUENCY,
                                   DataConfig = {
                                       'DatasetGroupArn': dataset_group_arn
                                    })

predictor_arn = create_auto_predictor_response['PredictorArn']

The trained predictor was then used to generate forecast values. Forecasts were generated using the create_forecast function from the previously trained predictor. This function takes the name of the forecast and the ARN of the predictor as inputs and generates the forecast values for the horizon and frequency defined in the predictor:

FORECAST_NAME = "SALES_FORECAST"

create_forecast_response = \
    forecast.create_forecast(ForecastName=FORECAST_NAME,
                             PredictorArn=predictor_arn)

Amazon Forecast is a fully managed service that automatically generates training and test datasets and provides various accuracy metrics to evaluate the reliability of the model-generated forecast. However, to build consensus on the predicted data and compare the predicted values with human predictions, we divided our historic data into training data and validation data manually. We trained the model using the training data without exposing the model to validation data and generated the prediction for the length of validation data. The validation data was compared with the predicted values to evaluate the model performance. Validation metrics may include mean absolute percent error (MAPE) and weighted absolute percent error (WAPE), among others. We used WAPE as our accuracy metric, as discussed in the next section.

Evaluation metrics

We first verified the model performance using backtesting to validate the prediction of our forecast model for long term sales forecast (2026 sales). We evaluated the model performance using the WAPE. The lower the WAPE value, the better the model. The key advantage of using WAPE over other error metrics like MAPE is that WAPE weighs the individual impact of each item’s sale. Therefore, it accounts for each product’s contribution to the total sale while calculating the overall error. For example, if you make an error of 2% on a product that generates $30 million and an error of 10% in a product that generates $50,000, your MAPE will not tell the entire story. The 2% error is actually costlier than the 10% error, something you can’t tell by using MAPE. Comparatively, WAPE will account for these differences. We also predicted various percentile values for the sales to demonstrate the upper and lower bounds of the model forecast.

Macro-level sales prediction model validation

Next, we validated the model performance in terms of WAPE values. We calculated the WAPE value of a model by splitting the data into test and validation sets. For example, in the 2019 WAPE value, we trained our model using sales data between 2011–2018 and predicted sales values for the next 12 months (2019 sale). Next, we calculated the WAPE value using the following formula:

We repeated the same procedure to calculate the WAPE value for 2020 and 2021. We evaluated the WAPE for all BLs in the auto end market for 2019, 2020, and 2021. Overall, we observed that Amazon Forecast can achieve a 0.33 WAPE value even for the year of 2020 (during the COVID-19 pandemic). In 2019 and 2020, our model achieved less than 0.1 WAPE values, demonstrating high accuracy.

Macro-level sales prediction baseline comparison

We compared the performance of the macro sales prediction models developed using Amazon Forecast to three baseline models in terms of WAPE value for 2019, 2020 and 2021 (see the following figure). Amazon Forecast either significantly outperformed the other baseline models or performed on par for all 3 years. These results further validate the effectiveness of our final model predictions.

Macro-level sales prediction model vs. human predictions

To further validate the confidence of our macro-level model, we next compared the performance of our model with the human-predicted sales values. At the beginning of the fourth quarter every year, market experts at NXP predict the sales value of each BL, taking into consideration global market trends as well as other global indicators that could potentially impact the sales of NXP products. We compare the percent error of the model prediction vs. human prediction to the actual sales values in 2019, 2020, and 2021. We trained three models using data from 2011–2018 and predicted the sales values until 2021. We next calculated the MAPE for the actual sales values. We then used the human-predicted values by the end of 2018 (test the model forecast 1Y ahead to 3Y ahead forecast). We repeated this process to predict the values in 2019 (1Y ahead forecast to 2Y ahead forecast) and 2020 (for 1Y ahead forecast). Overall, the model performed on par with the human predictors or better in some cases. These results demonstrate the effectiveness and reliability of our model.

Micro-level sales prediction and product lifecycle

The following figure depicts how the model behaves using product data while having access to very few observations for each product (namely one or two observations at the input for product lifecycle prediction). The orange dots represent the training data, the green dots represent the testing data, and the blue line represents the model predicted product lifecycle.

The model can be fed more observations for context without the need for re-training as new sales data become available. The following figure demonstrates how the model behaves if it is given more context. Ultimately, more context leads to lower WAPE values.

In addition, we managed to incorporate additional features for each product, including fabrication techniques and other categorical information. In this regard, external features helped reduce the WAPE value in the low-context regime (see the following figure). There are two explanations for this behavior. First, we need to let the data speak for itself in the high-context regimes. The additional features can interfere with this process. Second, we need better features. We used 1,000 dimensional one-hot-encoded features (bag of words). The conjecture is that better feature engineering techniques can help reduce WAPE even further.

Such additional data can help the model predict sales of new products even before the product is released on the market. For example, in the following figure, we plot how much mileage we can get only out of external features.

Conclusion

In this post, we demonstrated how the MLSL and NXP teams worked together to predict macro- and micro-level long-term sales for NXP. The NXP team will now learn how to use these sales predictions in their processes—for example, to use it as input for R&D funding decisions and enhance ROI. We used Amazon Forecast to predict the sales for business lines (macro sales), which we referred to as the top-down approach. We also proposed a novel approach using time series as a point cloud to tackle the challenges of missing values and cold start at the product level (micro level). We referred to this approach as bottom-up, where we predicted the monthly sales of each product. We further incorporated external features of each product to enhance the performance of the model for cold start.

Overall, the models developed during this engagement performed on par compared to human prediction. In some cases, the models performed better than human predictions in the long term. These results demonstrate the effectiveness and reliability of our models.

This solution can be employed for any forecasting problem. For further assistance in terms of designing and developing ML solutions, please free to get in touch with the MLSL team.

About the authors

Souad Boutane is a data scientist at NXP-CTO, where she is transforming various data into meaningful insights to support business decision using advanced tools and techniques.

Ben Fridolin is a data scientist at NXP-CTO, where he coordinates on accelerating AI and cloud adoption. He focuses on machine learning, deep learning and end-to-end ML solutions.

Cornee Geenen is a project lead in the Data Portfolio of NXP supporting the organization in it’s digital transformation towards becoming data centric.

Bart Zeeman is a strategist with a passion for data & analytics at NXP-CTO where he is driving for better data driven decisions for more growth and innovation.

Ahsan Ali is an Applied Scientist at the Amazon Machine Learning Solutions Lab, where he works with customers from different domains to solve their urgent and expensive problems using state-of-the-art AI/ML techniques.

Yifu Hu is an Applied Scientist in the Amazon Machine Learning Solutions lab, where he helps design creative ML solutions to address customers’ business problems in various industries.

Mehdi Noori is an Applied Science Manager at Amazon ML Solutions Lab, where he helps develop ML solutions for large organizations across various industries and leads the Energy vertical. He is passionate about using AI/ML to help customers achieve their Sustainability goals.

Huzefa Rangwala is a Senior Applied Science Manager at AIRE, AWS. He leads a team of scientists and engineers to enable machine learning based discovery of data assets. His research interests are in responsible AI, federated learning and applications of ML in health care and life sciences.