AWS for Industries

Enhancing Life Sciences Operations with Amazon SageMaker Canvas

The Life Sciences industry is experiencing significant growth, with the global laboratory supplies market projected to grow at a 7.5 percent compound annual growth rate from 2023 to 2030, according to Grand View Research. As organizations face increasing pressure to optimize operations, Amazon SageMaker Canvas offers capabilities to meet these challenges head-on. We’ll dive into practical applications of SageMaker Canvas, showcasing how it can optimize inventory management for research labs and streamline defect prediction in pharmaceutical manufacturing.

Forecasting Lab Inventory with Time Series Analysis

Our first example focuses on time series forecasting for lab inventory management. The objective is to predict the future usage of lab inventory, such as reagents and other consumables, given the historical usage of that inventory. This can help labs plan their inventory spending efficiently.

Our analysis utilizes a synthetic dataset, we created ourselves, to simulate lab inventory for reagents and consumables. This data incorporates realistic usage patterns, seasonality, trends, and product correlations. Labs can obtain similar data from their lab systems like electronic lab notebook (ELN), lab information management systems (LIMS), or procurement systems. Figure 1 shows a view of the synthetic data set and its distribution.

Figure 1- Table showing the data types and distribution of the lab inventory data setFigure 1 – Table showing the data types and distribution of the lab inventory data set

We imported the data and reviewed the Data Quality and Insights Report from Amazon SageMaker Data Wrangler. We then asked Chat for data prep to sort the data and create lagged features. Lagged features are transformed versions of the target variable where each value is shifted by a specific number of time periods into the past. We also asked Chat for data prep to handle missing values, and clean up placeholder values.

These preparatory steps help in revealing underlying patterns and trends, leading to more reliable predictions and informed decision-making.

Figure 2 – Amazon SageMaker Data Wrangler dataflow showing transformations and Data Quality Insights Report

Figure 2 – Amazon SageMaker Data Wrangler dataflow showing transformations and Data Quality Insights Report

Analyzing the Inventory Forecasting Model
Following data preparation, we built a forecasting model to predict inventory usage. We selected a Standard Build configuration as it provides a good balance of accuracy and processing time for this type of forecasting problem.

The forecasting model demonstrated consistent performance with an Average Weighted Quantile Loss of 0.152. It showed strong accuracy with a Mean Absolute Percentage Error of 24.7 percent and a Weighted Absolute Percentage Error of 20.1 percent. In practical terms, this means if the model predicts a need for 100 units of a reagent, the actual usage typically falls between 75-125 units – sufficient for month-ahead planning but requiring safety stock considerations.

The Root Mean Square Error of 12.374 indicated minimal deviation between forecasted and actual values, while the Mean Absolute Scaled Error of 0.670 showed that this model’s errors are about 33 percent smaller than those of a naive forecast method. This improvement over basic forecasting approaches suggests the model has successfully captured meaningful patterns in inventory usage.

For lab managers, these results indicate the model can reliably support inventory planning decisions, though maintaining a 25% safety stock buffer would be prudent to account for prediction uncertainties. For detailed explanations of these metrics, explore the metrics reference.

When using a model, consider how consistent its performance is across quantiles, the improvement over simple forecasting methods, and overall accuracy. Consider how these characteristics align with your specific forecasting needs and tolerance for variance. To learn more, read “Is your model good? A deep dive into Amazon SageMaker Canvas advanced metrics.” You can find details about the current metrics by reading about objective metrics.

Figure 3 - The model status provides model metrics

Figure 3 – The model status provides model metrics

Making Predictions and Deploying the Inventory Model
To make single predictions you can use the Make single predictions with time series forecasting models steps. This will build a prediction for a single datapoint and change individual values to see how they impact prediction outcomes. In Figure 4, the example item is for a DNA Extraction Kit during the date range of December 31, 2024 to January 30, 2025. Four different predictions are shown. P50 represents the median (50th percentile) indicating the best estimate. P25 is the 25th percentile (lower estimate), and P75 and P90 are 75th and 90th percentile estimates (upper estimates).

Figure 4 - Single prediction type for a specific item and rangeFigure 4 – Single prediction type for a specific item and range

When you are ready to deploy a model to an endpoint, to use outside of Amazon SageMaker Canvas to make real-time predictions and integrate with your existing applications, follow the Deploy your models to an endpoint documentation.

Manufacturing Defect Prediction

In our next example we demonstrate how to build a predictive model for identifying defects in pharmaceutical manufacturing processes. This capability is essential for optimizing production efficiency, reducing costly rework, and improving overall product quality.

Our example dataset includes comprehensive metrics influencing defect rates, such as production volumes, supply chain quality, quality control, maintenance, inventory, workforce productivity, energy use, and additive manufacturing details.

Data Preparation and Feature Analysis

Using Amazon SageMaker Data Wrangler to import our data, create a data flow, and generate a Data Quality and Insights Report. Following, in Figure 5, is the feature summary providing detailed information about the data including validity, missing values, high and medium severity warnings, and prediction power. The summary states that the dataset is 100 percent valid, with zero percent missing and has zero high/medium warnings. This means the data is clean and complete, requiring no additional feature engineering or data preprocessing.

Figure 5 - Feature summary showing the prediction power, type, high and medium warningsFigure 5 – Feature summary showing the prediction power, type, high and medium warnings

We select the target column of maintenance hours. SageMaker Canvas detected for us the model type as a 2 category prediction. You can explore additional details about how custom models work. However, we are interested in doing a Quick build, to prioritize speed over accuracy at this initial stage. With our Quick build complete, let’s now examine the model’s performance and insights in the Analyze section.

Model Performance and Insights

In the model Analyze overview, SageMaker Canvas shows the column impact in order of most impactful to least impactful. In this use case example maintenance hours are the most impactful to predicting defect status. Figure 6 depicts the model’s analysis, showing the accuracy, F1 score, column impact on predictability and a scatter plot of MaintenanceHours on prediction of defect status. Here, MaintenanceHours is the column with the most impact on DefectStatus.

Figure 6 - Model analysisFigure 6 – Model analysis

In the model for predicting machine defects, we achieved promising results:

  • Accuracy: 95.686%
  • F1 Score (Optimization metric): 0.849

The high accuracy score indicates that our model correctly classifies machine status (defective or non-defective) in over 95 percent of cases. The F1 score of 0.849, our optimization metric, suggests a strong balance between precision and recall in identifying defects. You can learn more by reading our Metrics for categorical prediction documentation.

Figure 7- Comparing Predictions with Actual measures.Figure 7- Comparing Predictions with Actual measures.

We can test different values to see how they affect the predictive outcome. We can also test our model following the steps to use manual batch predictions, or you have the option to set up automatic updates for when a dataset is updated for automatic batch predictions.

Figure 8 - Predict target value using Single predictionFigure 8 – Predict target value using Single prediction

Once you are ready to deploy the model outside of Amazon SageMaker Canvas, you can create an endpoint. Then setup Amazon SageMaker Model Monitor for continuous monitoring. Learn how SageMaker Model Monitor works.

Conclusion

Teams in lab operations and pharma manufacturing are seeking simplicity, greater efficiency, less re-work, and proactive discovery of issues in their work. However, the separation of skillsets can leave data scientists without the expertise to understand Life Science data, or Life Science experts without the software skills to build scalable machine learning applications.

Amazon SageMaker Canvas fills this role by empowering Life Sciences teams to enhance operational outcomes through a user-friendly visual interface and powerful model training capabilities. By leveraging SageMaker Canvas, organizations can democratize machine learning and enable data-driven decision-making across their Life Sciences workflows.

Contact an AWS Representative to learn how we can help accelerate your business.

Get started with Amazon SageMaker Canvas for your organization

James Gaines

James Gaines

James Gaines is a Senior Solutions Architect for Healthcare and Life Sciences at AWS. He has a background in highly regulated environments, including the Department of Defense and pharmaceutical industry. James holds all active AWS Certifications and specializes in cloud migrations, application modernization, and advanced analytics to drive innovation in Healthcare and Life Sciences.