AWS Partner Network (APN) Blog

Accelerating Time-to-Compliance in HCLS Through Automated FDA Forms Processing with AI on AWS

By Rinat Akhmetov, Sr. Solutions Architect – Provectus
By Nirav Shah, Principal Solutions Architect – AWS

Connect with Provectus-1

In the healthcare and life sciences (HCLS) sector, companies are required to strictly comply with regulatory statutes. It’s crucial to ensure products and services are not just developed but also manufactured and distributed in accordance with the most stringent standards.

For over two decades, PSC Biotech, a global life sciences consultancy, has been a trusted partner for businesses in the HCLS field, assisting clients across 52 countries to keep up with current regulations and technologies. However, with mounting pressure to expedite and improve the accuracy of FDA Form 483 observation processing, PSC Biotech found itself facing a significant challenge.

To optimize and streamline its document processing operations, PSC Biotech was looking for a novel solution powered by artificial intelligence (AI) and machine learning (ML). The company turned to Provectus, a provider of AI adoption and ML development services with expertise in intelligent document processing (IDP) solutions. The companies joined forces to automate PSC Biotech’s document pipelines on Amazon Web Services (AWS), dramatically improving how FDA Form 483 observations were processed.

Provectus is an AWS Premier Tier Services Partner with Competencies in Machine Learning, Data and Analytics, and more. Provectus is an AI-first technology consultancy and solutions provider helping design, architect, migrate, and build cloud-native applications on AWS.

In this post, we explore the challenges encountered, cutting-edge solutions employed, and results achieved on AWS. We examine the key factors that contributed to the project’s success, and provide insights for organizations seeking to automate their own document processing operations.

We will delve into the following technical aspects of the project:

  • Building a highly accurate ML model for observation classification.
  • Establishing a secure and reproducible, end-to-end infrastructure for machine learning.
  • Implementing CI/CD pipelines that adhere to industry and cloud best practices.
  • Delivering a user-friendly management system for documents within PSC Biotech’s existing document pipeline.

We will also share lessons learned from the PSC Biotech and Provectus collaboration, and discuss how this IDP project can potentially reshape document processing in the HCLS sector.

The Challenge

PSC Biotech has been assisting companies in the HCLS sector adhere to all relevant regulatory requirements for over two decades. Many of its services entail substantial management of documents—encompassing the collection, processing, and analysis of document data flowing through the client network.

As a well-established company, PSC Biotech has relied on document processing operations that have been in place for years. However, the majority of processes within its pipelines were manual, creating the following challenges:

  • Increased time and resource allocation for document processing.
  • High processing costs that continued to rise over time.
  • Persistent risk of human errors.
  • Throughput rate wholly dependent on the number of employees.
  • Stagnant processing accuracy, with considerable fluctuations over time.

Considering the sensitive nature of HCLS business operations, any slow, inefficient, or error-prone process can present significant risks. The health and well-being of millions of individuals, along with a company’s financial stability, could be jeopardized by a single error committed by a document reviewer.

Take FDA Form 483 observations, for instance. An FDA 483 observation is a notice issued by the FDA to identify potential regulatory violations (such as processes, controls, products, or employee practices) discovered during a routine inspection.


Figure 1 – Examples of FDA Form 483 observations.

The cost of such observations can be substantial, but the financial consequences of failing to implement the necessary changes in a timely manner are even more severe. Indeed, the average cost of implementing compliance measures for medium to large-sized healthcare organizations can amount to $80,000, a significantly smaller figure compared to the staggering $180,000 to $8.3 million in potential fines.

PSC Biotech handles thousands of FDA Form 483 observations annually. While its document mappers and reviewers have been diligent, the necessity to automate document processing pipelines was apparent. According to an assessment by PSC Biotech’s leadership, employing an AI/ML-powered document processing system was the best solution to enhance document-centric operations.

By integrating AI/ML into document processing, PSC Biotech’s leaders expected to reduce time spent on manual observation reviews, cut form processing costs, mitigate risks associated with infractions made by mappers and reviewers, increase throughput rate, and enhance accuracy in document processing.

Designing, building, deploying, and integrating an IDP solution would require the expertise of a third party, and Provectus was ready to step into this role.

The Solution

Provectus proposed building an AI/ML-powered document processing solution for the automated classification of FDA Form 483 observations.

The IDP project involved a series of engagements that encompassed data and model work (training, deployment to production, retraining), infrastructure and pipeline development, implementation of logging and monitoring components, and a user interface (UI) for document management. It was set up in two phases.

Phase One

In the first phase, the data was thoroughly explored and prepared; all necessary environments for development, management, and experimentation were set up; a baseline for text classification was established; and a model for classification of FDA Form 483 observations was developed.

A secure and reproducible, end-to-end machine learning infrastructure for experimentation and model training was delivered as part of the initial engagement.


Figure 2 – High-level architecture delivered in Phase One.

Here’s a brief overview of the services used for building the solution during Phase One:

  • Amazon S3: Used as a data lake to store texts, document observations, and model predictions.
  • Amazon Comprehend: Helped to develop a validated baseline for text classification.
  • Amazon SageMaker Suite: Used as a foundation for the model training pipeline.
  • AWS Step Functions: Helped to orchestrate multiple AWS services to build and update applications quickly.
  • Amazon Textract: Used as an optical character recognition (OCR) engine for PDF documents.
  • AWS Lambda: Helped to check the model metrics and register the model if it surpassed previous metrics.

A variety of deep learning and natural language processing (NLP) algorithms were employed to extract and classify observations with greater accuracy. Additionally, frameworks such as PyTorch, Transformers, and NLTK were used.

The observation classification was delivered as a multi-label classification model capable of categorizing FDA Form 483 observations over 100 labels, achieving precision and recall rates of 70% or higher. Observations were automatically labeled and could be sorted into various categories. Users could search for observations by selecting any of the model-generated labels.

Phase Two

In the second phase, the ML API service (based on OpenAPI specifications) was developed, and the ML release cycle was significantly enhanced. Emphasis was placed on delivering pipelines for CI/CD, logging and monitoring, and model retraining. An ML infrastructure for the production environment was established, building upon the existing foundation from Phase One.

Other objectives included developing a user-friendly UI for document processing to enable PSC Biotech employees to map and review forms more efficiently.

A special service to facilitate seamless integration of the ML/IDP component into the existing document processing pipeline was delivered. It helped establish a CI/CD pipeline capable of retraining the existing model and deploying it to production when higher precision and recall were achieved.


Figure 2 – High-level architecture delivered in Phase Two.

Here’s an overview of the services used to further improve the document processing solution during Phase Two:

  • Amazon S3: Employed as an intermediate storage service for document data between processing steps, and as a document storage service.
  • Amazon SageMaker Serverless Inference: Used as a main inference of the model, which is required to process documents during working hours only.
  • Amazon Textract: Utilized as an OCR engine for PDF documents.
  • AWS Lambda: Provided access to a complete API that enabled it to handle documents, including user engagement with documents and their processing by ML models.
  • Amazon Cognito: Enabled the user sign-up and sign-in features, and also used to control access to resources.
  • Amazon CloudFront: Delivered UI to the end users.
  • Amazon RDS: Utilized to store document- and model-related information.
  • Amazon DynamoDB: Stored WebSocket IDs and user meta information to facilitate training.
  • Amazon SQS: Used as a central communication channel for asynchronous interactions.
  • Amazon API Gateway: Acted as a mediator for communication between the backend serverless, implemented on AWS Lambda, and the user.

Phase Two also included the development and delivery of a user-friendly UI for document processing. This was designed to help PSC Biotech employees map and review FDA Form 483 observations more quickly and efficiently, without the need to delve into technical details of the system.

All phases were executed in a timely fashion, in close collaboration between Provectus and PSC Biotech.

The Outcome

An automated, AI/ML-powered document processing solution for classification of FDA Form 483 observations was designed and built from scratch in two months.

The delivered model achieved a precision and recall of no less than 70%, assisting PSC Biotech in automating a significant portion of the labeling, mapping, and review of observation forms. Additionally, an entire ecosystem surrounding the model was delivered, including ML infrastructure, CI/CD pipelines, monitoring and UI components, and other features.

The newly-implemented observation classification solution allows PSC Biotech to significantly reduce the time spent on manual review of observations, optimizing processing costs while enhancing the accuracy and throughput of document processing, all while mitigating risks of infractions and errors made by mappers and reviewers.

Specifically, PSC Biotech was able to:

  • Accelerate document processing operations by 90%.
  • Decrease document processing costs by 44%.
  • Increase document throughput by 10X
  • Achieve an estimated return on investment (ROI) of 93% over 12 months.

By empowering its workforce with AI/ML and intelligent document processing, PSC Biotech achieved enhanced operational excellence. With AI/ML-enabled automation of document pipelines, the company was able to save and repurpose over 5,000 man-hours.

PSC can now handle FDA Form 483 observations more rapidly, accurately, cost-effectively, and on a larger scale.


PSC Biotech is a trusted provider of services in the healthcare and life sciences sector. It’s crucial for PSC Biotech to process FDA Form 483 observations as swiftly and accurately as possible, ensuring its clients remain up to date in implementing necessary changes to their products, services, and operations.

Integrating AI/ML and automation into the document processing pipeline is regarded by PSC Biotech as a critical step towards achieving its objective of delivering the highest standard of regulatory compliance services. Provectus aided PSC Biotech in its pursuit of excellence by contributing its expertise in AI/M and intelligent document processing (IDP).

To learn more about the Provectus IDP solution, visit the IDP webpage, and watch the webinar for more practical advice.

If you’re interested in implementing an IDP solution in your organization, apply for the Intelligent Document Processing Acceleration Program to start building your first pilot in just three weeks.


Provectus – AWS Partner Spotlight

Provectus is an AWS Premier Tier Services Partner and AI-first technology consultancy and solutions provider that helps design, architect, migrate, and build cloud-native applications on AWS.

Contact Provectus | Partner Overview | Case Studies