AWS for Industries

Driving Life Sciences Manufacturing “Industry 4.0” using Image Analytics


This blog summarizes the use of image-based computing patterns in life sciences manufacturing to accelerate the cloud journey and realize predictive plant capabilities as foundational steps toward an “Industry 4.0” vision. AWS computer vision services automate discrete process steps using image analytics. We explore core business challenges and opportunities, target use-cases and AWS architecture patterns needed to implement the use of image analytics to optimize key manufacturing process workloads. Additionally, automation of these types of workflows pulls cost out of the process and allows you to allocate scarce resources to other critical process areas in either QA or non-QA related functions. Finally, we describe how a digital strategy is critical to image processing workflows and provide a sample anomaly detection application to help accelerate your image analytics journey.

Digitally Connected Smart Manufacturing

Pharma and MedTech companies are experiencing a wave of innovations—from new treatment modalities, to smart machines, advanced analytics, and digital connectivity. Podular manufacturing and In-country/rapid response capabilities will require automation to scale.

Artificial Intelligence is turning out to be a fundamental change, with countless applications in nearly every domain. It is now making its way into the area of Production and Manufacturing, allowing it to harness the power of deep learning and in doing so, providing automation that is faster, cost effective and more superior.

When and where is Visual Inspection needed?

According to research conducted by Drury and Fox, visual inspection errors typically range from 20% to 30%. Some imperfections are due to human error, while others are because of limitations of space.

Reducing certain errors through training and practice is effective but not a guarantee to eliminate all errors. Visual inspection errors in manufacturing take one of two forms — missing an existing defect or incorrectly identifying a defect that does not exist (false positive). Misses tend to occur much more frequently than false alarms. Misses can lead to loss in quality, while false positives can cause unnecessary production costs and overall wastage. Manual inspection remains a costly venture because of the appointment of (multiple) trained individuals.

Computer vision

Automated Visual inspection can overcome these problems by making the whole procedure of visual inspection independent of any human involvement. Using automated systems typically surpasses the standard of manual inspection.

With “Manufacturing Image Analytics” process and quality engineers can leverage the use of state-of-the-art computer vision to modernize plant efficiency, yield and process improvements through visual inspection and interpretation specific to down-stream packaging and fill workflows. To improve batch yield, pharma manufacturers must understand the source of variability across all quality process inputs including measurements, personnel, operational, and equipment calibration. The goals are to always safely produce a maximum yield batch that is within quality specifications, yet optimize batch and packaging specific inputs.

Business Challenges 

Some of the business challenges include the need to reduce labor costs during the QA/QC process, including the ability to re-allocated scarce, costly resources to other areas of the QA/QC process. This allows for these resources to be assigned to other higher-value tasks or activities.  Another business challenge is the ability to detect when equipment is out of calibration on an on-going basis.  Automating calibration programmatically can improve overall production yield values.  An additional solvable business challenge is the ability to improve predictability specific to sample accept and reject rates, a step that should be automated to increase production efficiencies on the shop floor.

Technical Challenges

Technical challenges which are addressable with “Manufacturing Image Analytics” include the lack of ability to capture real time image data plus the inability to leverage image data for identifying anomalies or variability as part of on-going production-level process activities.  Due to these technology limitations organizations lack ability to tune and optimize operational metrics i.e., throughput, yield, release time and quality processes which in turn lead to sub-optimal process flows.  In addition, sub-optimal workflows which would benefit from image driven inspection techniques are further advanced by image interpretation and machine learning techniques.

Manufacturing operations must thrive in a challenging business environment driven by regulations and competition. Lack of modernization and real-time data and image processing limits productivity, increases operational burden and costs. It is imminent to increase value of existing data infrastructure, reduce costs by operational efficiency, improve batch consistency, and identify deviation casualty. The ideal quality management system comprises five elements: right data; right source; right time and place; right person; and right decisions. [McKinsey]

Areas to consider for automation

As the biopharma industry continues with the fourth revolution, commonly referred to across multiple industries as “Industry 4.0” the use of computer vision has taken center stage. Image detection workloads for both upstream and downstream unit operations are allowing customers to achieve quality specific process improvements related to batch yield.  However, the use of image analytics is broad and is applicable in many areas across the value chain. Automated visual inspection as a computing pattern has broader applicability, which allows enablement for a rich set of use-cases. Some leading examples are:

  • Packaging Inspection: Core to Biopharma downstream manufacturing processes is the ability to ensure proper quality requirements specific to the raw packaging material inputs used for vaccines, insulin, tablets and other products manufactured within regulated environments. One such example is the inspection of raw packing materials such as glass vials which require verification of imperfections, including air bubbles, cracks or small fissures prior to moving to the final fill process step. Image analytics can automate and detect with higher accuracy these types of defects to ensure quality and compliance requirements are met.
  • Product Final Fill: Batch yield is an important element of the production process. Another example where image analytics can lead to measurable outcomes is in the form of cost-out is with the downstream product fill process. Streamlining improvements specific to quality steps which target accept and reject rates drive better fill accurately levels utilizing image analytics workflows to automate steps which were previously handled manually.
  • Packaging Verification: Regulatory requirements stipulate that label placement must meet stringent orientation guidelines. Using image capture and analysis can automate label placement accuracy post final fill to ensure that label orientation and alignment fall within desired tolerances. Additionally, automating the packaging verification step for label placement can free up QA resources to focus on other manufacturing related tasks.
  • Gowning: Image analytics can be used for verification that the shop floor gowning SOP’s are properly followed by employees to avoid and prevent contamination.  Mapping correct gowning and suiting protocol patterns against incorrect or inconsistent procedures ensure that contaminations do not find their way into the lab and that proper operating protocols are built into manufacturing shop floor activity.
  • Pathing: The COVID-19 pandemic drove substantial change in how manufacturing operates. Social distancing and isolation mandates drove worker pathing protocols on the shop floor. To address these challenges and optimize worker pathing, the use of image detection and real-time analysis of worker movement enables worker monitoring and pathing activity to identify route and workflow efficiencies. Additional benefits include reductions in time-to-task as well as improving worker safety.

Solution Overview

Start by building the necessary resources in the AWS account that you intend to use. The architecture below depicts the AWS services we leverage for this architecture. Please note that the architecture below is a reference architecture, and the final solution components might vary based on the customer’s requirements. Please engage with your AWS account team/resources to conclude the final architecture that best meets these requirements.


Below is the suggested workflow for this solution:

  • Connect the customer site(s) to the Amazon VPC that hosts the solution.
  • Identify the appropriate service/solution to upload the images to S3.
  • Define the workflow to process the ingested images and capture the metadata.
  • Leverage the anomaly detection model to detect normal vs. anomalous images.
  • Create the training/testing datasets.
  • Create and train the ML Model.
  • Host the model.

Establishing connectivity between the customer site(s) and the customer Amazon Virtual Private Cloud (Amazon VPC) in the desired AWS Region is the first step as we work through the solution. Once the connectivity is in place, there are several methods to ingest data from the various data sources as depicted below. In this blog, we leverage AWS DataSync to bulk upload images from the customer site to the AWS account, and store the data in an S3 bucket. AWS DataSync is an online data transfer service that simplifies, automates, and accelerates moving data between on-premises storage systems and AWS Storage services. The data is transferred from a Network File System (NFS) share or Server Message Block (SMB) to the AWS Cloud using AWS DataSync.

We leverage Amazon Simple Notification Service (Amazon SNS), Lambda function to work with the dataset in various S3 buckets that hosts both normal and anomaly images. We will need to track the images’ locations to provide as an input while working with Amazon Lookout for Vision.

We create an anomaly detection model to detect normal vs. anomalous images using Amazon Lookout for Vision. Amazon Lookout for Vision is a machine learning (ML) service that spots defects and anomalies in visual representations using computer vision (CV). With Amazon Lookout for Vision, manufacturing companies can increase quality and reduce operational costs by quickly identifying differences in images of objects at scale.

To create a proof of concept, we will build an application which detects anomalies in images of manufacturing casts. View the full details of the dataset and download it from the following link. The dataset is already labeled (normal vs defective) and is already split into test and train. Download the data and unzip into a local directory. Then, upload the images into an S3 bucket in the following structure:

S3 Bucket


Make sure

  • The training normal images are located at s3://s3_bucket/Train/Normal/
  • The training anomalous images are located at s3://s3_bucket/Train/Anomaly/
  • The test normal images are located at s3://s3_bucket/Test/Normal/
  • The test anomalous images are located at s3://s3_bucket/Test/Anomaly/

Amazon Lookout for Vision requires a manifest file that contains the S3 location of the images and their labels in JSON lines format. The below Python code creates the manifest files for train and test images.

from datetime import datetime
import json

now =
dttm = now.strftime("%Y-%m-%dT%H:%M:%S.%f")

datasets = ["train", "test"]
directory=”data” # replace with the name of the local folder where the data is.
bucket = “bucket_name” # replace with the name of the s3 bucket

for ds in datasets:

    folders = os.listdir("./{}/casting_data/{}".format(directory,ds))

    with open("{}.manifest".format(ds), "w") as f:
        for folder in folders:

            files = os.listdir("./{}/casting_data/{}/{}".format(directory,ds, folder))
            label = 1
            if folder == "anomaly":
                label = 0

            for file in files:
                manifest = {
                  "source-ref": "s3://{}/lookoutforvision/data/{}/{}/{}".format(directory, bucket, ds, folder, file),
                  "auto-label": label,
                  "auto-label-metadata": {
                    "confidence": 1,
                    "job-name": "labeling-job/auto-label",
                    "class-name": folder,
                    "human-annotated": "yes",
                    "creation-date": dttm,
                    "type": "groundtruth/image-classification"

Upload the manifest files into the same S3 bucket. Make a note of the location of the manifest files as to include as inputs while working with Amazon Lookout for Vision.

Now that we have our images and manifest files on S3, we can create a project in Amazon Lookout for Vision. You can perform the following steps via the AWS Management Console, AWS CLI or APIs. I am pasting the code snippets to do this using the python Boto3 API

Creating the project

import boto3
project = “project_name” # replace with name of your project
client = boto3.client('lookoutvision')
print('Creating project:' + project)
print('project ARN: ' + response['ProjectMetadata']['ProjectArn'])

Creating the training/testing datasets

import boto3
dataset_type ='train or test' #replace with train or test
manifest_file = ‘prefix and file name for the manifest' #replace with the path to manifest file for train or test on S3
bucket = ‘bucket_name’ #replace with S3 bucket name
project = “project_name” # replace with name of your project
client = boto3.client('lookoutvision')

print('Creating dataset...')
dataset=json.loads('{ "GroundTruthManifest": { "S3Object": { "Bucket": "' + bucket + '", "Key": "'+ manifest_file + '" } } }')

response=client.create_dataset(ProjectName=project, DatasetType=dataset_type, DatasetSource=dataset)
print('Dataset Status: ' + response['DatasetMetadata']['Status'])
print('Dataset Status Message: ' + response['DatasetMetadata']['StatusMessage'])
print('Dataset Type: ' + response['DatasetMetadata']['DatasetType'])

Creating and training the model

import boto3
client = boto3.client('lookoutvision')

output_bucket = ‘output_bucket’ #replace with the output bucket name
output_folder = 'output_folder' #replace with output prefix

print('Creating model...')
output_config=dataset=json.loads('{ "S3Location": { "Bucket": "' + output_bucket + '", "Prefix": "'+ output_folder + '" } } ')

response=client.create_model(ProjectName=project, OutputConfig=output_config)
print('ARN: ' + response['ModelMetadata']['ModelArn'])
print('Version: ' + response['ModelMetadata']['ModelVersion'])
print('Status: ' + response['ModelMetadata']['Status'])
print('Message: ' + response['ModelMetadata']['StatusMessage'])

Hosting the model

Once the model has been trained, you can go ahead and host it for predictions.

import boto3
client = boto3.client('lookoutvision')
project = “project_name” # replace with name of your project

print('Starting model version ' + model_version + ' for project ' + project )
print('Status: ' + response['Status']) 

Wait for the model to be hosted. Once completed, you can go ahead and create a front-end application to generate predictions from the model. The details of how to create the application is listed in the following repository:

The final step is to consider business intelligent tools to use for the data consumption layer. As we collect and store metadata for the images throughout the data ingestion and processing stages, customers can bring in their preferred visualization and reporting tools to empower their end users to generate insights and dashboards.


AWS offers various methods to connect customer’s manufacturing facilities and data centers to their AWS accounts. Customers can leverage AWS Direct Connect to establish a dedicated network connection between their network and one of the AWS Direct Connect locations. There are two types of connections that customers can consider with AWS Direct Connect. Customers can have either a Dedicated Connection, which is a physical Ethernet connection associated with a single customer, and customers request it directly, or a Hosted Connection, which is a physical Ethernet connection that an AWS Direct Connect partner provisions on behalf of the customer. Please visit AWS Direct Connect Delivery Partners for more information our partners.

AWS Site-to-Site Virtual Private Network (VPN) is another method to establish secure connections between on-premises networks, and the AWS Global network. AWS Site-to-Site VPN creates encrypted tunnels between your network and the Amazon Virtual Private Clouds.
So, which method should customers consider? VPN Connections setup takes few minutes and are a good solution if you have an immediate need, have low to modest bandwidth requirements, and can tolerate the inherent variability in Internet-based connectivity. On the other hand, AWS Direct Connect does not involve the Internet, and it uses dedicated, private network connections between their intranet and Amazon VPC.

With AWS Transit Gateway, customers can easily connect VPCs, AWS accounts, and on-premises networks through a central hub. This approach simplifies the network and eliminates the complex peering relationships that were needed in many cases in the past. AWS Transit Gateway acts as a cloud router; where each new connection connects to the AWS Transit Gateway. For visit AWS Transit Gateway to learn more.

For the purpose of this blog post, the solution leverages an existing Direct Connect between the customer site and the customer Amazon VPC. Understanding the connectivity options, and the data volume to transfer to the AWS Cloud is important. Work with your AWS team/resources to conclude the best architecture that best meets your requirements.

Getting started with AWS

AWS Biopharma customers are already innovating to leverage the benefits of image analytics to move them toward achieving their “Industry 4.0” vision. Moreover, as the bar for manufacturing innovation rises, the use of advanced imaging services will transition from being innovative to being the requisite baseline of expected capabilities. While this transition can be challenging for life sciences customers to understand where to start, AWS offers workshops designed specifically around manufacturing needs to provide an objective assessment across several critical aspects: people, platform, operations, compliance, governance, security and business outcomes which are fully aligned with our plant or site. AWS offers a complete solution package in the areas of image analytics to help accelerate your modernization journey.

Getting started is easy.  If you are familiar with AWS services and tools you can immediately leverage the project in this blog using a training data set or a relevant data set from your shop-floor operations.   From there you can build and train your model and then host the model in your AWS account leveraging the sample code provided.  Alternatively, AWS professional services can help accelerate your use of image analytics by performing a manufacturing maturity assessment.  We can assess shop-floor readiness for developing and deploying a fully operational image analytics pipeline at your facility and then work backwards to envision, define and complete your project goals based on your critical business needs.

Reach out to your AWS account executive to understand how you can get started with AWS to initiate or accelerate your manufacturing and quality digital transformation.

Asha D’Souza

Asha D’Souza

Asha D’Souza PhD is a Principal Healthcare and Life Sciences Industry Advisor at AWS, where she leads Business Development, Strategy and Advisory in the US-WEST. Asha has over 20 years of experience in digital transformations across value chain in leading life sciences companies. She is passionate about improving quality of care for patients, accelerating drug discovery, development and time to value by using technology as the enabler.

Karim Afifi

Karim Afifi

Karim Afifi is a Solutions Architect Leader with Amazon Web Services. He is part of the Global Life Sciences Solution Architecture team. He is based out of New York, and enjoys helping customers throughout their journey to innovation.

Misha St. Lorant

Misha St. Lorant

Misha StLorant has over 15 years in the Life Science and Healthcare industry with specific focus in the biopharmaceutical, immunotherapy and clinical trials domains. Prior to joining AWS, Misha was an engineering solutions director focused on Biopharma and Cell Therapy manufacturing, pharmaceutical clinical trials and imaging patient care pathway workflow optimization at GE. Misha also spent over 11 years at Microsoft running program, product and engineering teams focused on healthcare IT, platforming and patient facing applications. Misha received a BA from Oregon State University in economics and finance, an MBA in technology management from UOP and certificates in statistical analysis from Northwestern University – Kellogg School of Management.

Ujjwal Ratan

Ujjwal Ratan

Ujjwal Ratan is a Principal Machine Learning Specialist in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.