AWS for Industries

Conducting end-to-end pharmacovigilance workflows using AWS technologies

Introduction

Bio-pharmaceutical companies have always had a responsibility to report adverse events (AE), whether in clinical development or post market-approval scenarios, to the Food and Drug Administration (FDA).  The COVID-19 pandemic and the race to find effective treatments for this disease has caused both the FDA and bio-pharmaceutical organizations around the world to re-imagine the typical timelines associated with bringing drugs to patients.  However, even during these unprecedented times, adverse event reporting still acts a key mechanism for maintaining the three-way trust between regulatory agencies, bio-pharmaceutical companies and the public.  In May of 2020, the FDA restated its 2012 guidance for adverse reporting during flu pandemics to include COVID-19.  Sponsors are required to report any serious adverse events they find to be unexpected and could reasonably have been caused by their drugs.  In an effort to focus limited assets on R&D and other activities, many life science companies are outsourcing their pharmacovigilance (PV) efforts, including adverse event detection while other customers are augmenting manual workflows with software.  As evidence of this, the market for PV software is projected to reach $207.7M by 2024, according to a report by Grand View Research.  But is manual labor assisted by software or simply outsourcing adverse event detection the most effective, scalable approach given increasing data and the importance of adverse events for ensuring patient safety?

In this blog post, we will demonstrate how customers can use AWS technology to collect AE from different sources, store, process, enrich, analyze, visualize, and predict key outcomes. We will also see how AWS technology can be used to submit the information to PV databases like FDA Adverse Event Reporting System (FAERS).

Solution overview

The following diagram illustrates the solution architecture.

Conducting an end-to-end pharmacovigilance workflow using this architecture consists of the following steps:

Step 1: Data sources (third-party applications)

Many customers have in-house Clinical Trial Management Systems (CTMS) and several other third-party systems hosted and maintained by Clinical Research Organizations (CROs), partners, etc., used to effectively plan, manage, and track clinical study portfolios. In addition to pulling data from CTMS systems, phamacovigilance can benefit from ingesting data from Electronic Health Record (EHR) systems thus presenting a full picture about the patient’s medical history. These systems can have a variety of different interfaces, ranging from direct database access, file transfer systems, API driven access, etc. The data volumes being ingested can range from several kilobytes per week to several terabytes per week, depending on the size of the clinical study. In addition to this, customers frequently encounter unstructured data sources like x-rays, images, discharge notes, or call center notes which can greatly influence the outcome of an adverse drug event. Connecting these systems to a cloud-based pharmacovigilance solution provides the most actionable data needing the least amount of transformation.

Step 2: Data sources (patient centric)

Smart sensors and wearables, social media and voice enabled interactions are fast becoming the norm for collecting data like activity tracking, patient engagement, vital signs monitoring, and providing meaningful feedback. Global pharmaceutical companies are increasingly leveraging these technologies to improve data collection frequencies, improve data quality and optimize costs, along with providing a seamless, rich, engaging, and accessible patient-centric user interfaces. This method of data collection also bears well during pandemics like COVID-19 where face-to-face data collection is not possible or practical.

Many pharmaceutical companies and Clinical Research Organizations (CROs) operate a call center/nurse hotline to collect data from patients. These call centers often see large call volumes, which typically peak with events like drug launches, pandemics, etc. and are often backed by human teams, making it difficult to add/remove capacity in response to demand. Other ways of collecting data from patients include eConsent systems and Patient Reported Outcome (PROs) systems which are routinely used by organizations to conduct surveys.

Lastly, with the ubiquitous proliferation of social media, many organizations are turning towards mining social media to derive insights into the patient mentality. Social media mining can help organizations stay in touch with ground reality and understand trends as they happen, uncovering a wealth of information previously unavailable.

Step 3: Batch data ingest

Third-party clinical and health applications use many different technologies to share and export data. Most of the CTMS and related systems offer an API to export data. AWS Batch offers a seamless and cost effective way to consume these APIs without any need to install and manage server clusters. AWS Batch can run scheduled, event driven, and on-demand jobs to consume modern day REST APIs (or even legacy SOAP APIs) as shown in the blog post Creating a Simple “Fetch & Run” AWS Batch Job and store it on Amazon Simple Storage Service (Amazon S3).

To ingest flat files (audio, video, documents, semi-structured/unstructured data, etc.), you can use AWS DataSync to efficiently and quickly move large amounts of data online between on-premises storage and Amazon S3. DataSync eliminates the need to modify your applications since it can connect directly to your Network File System (NFS), Server Message Block (SMB) storage, and your self-managed object storage. This makes connecting to and mining a legacy data source seamless and less clunky, allowing you to focus on data analysis and processing rather than the complexities involved with huge scale data transfer semantics.

Some legacy applications might not offer APIs to export data. In these cases, database replication can be a solution. To replicate a database in a batch mode or in an ongoing fashion, you can use AWS Database Migration Service, allowing you to exchange data between homogeneous as well as heterogeneous databases. The AWS Schema Conversion Tool makes heterogeneous database migrations predictable by automatically converting the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database, allowing you to move away from legacy old guard database technologies to modern, open source databases like Amazon Aurora.

Lastly, you can use AWS Glue, a managed ETL service, to extract, transform, and load (ETL) prepare and load your data for analytics from a variety of data sources including Aurora, Amazon RDS for MySQL, Amazon RDS for Oracle, Amazon RDS for PostgreSQL, Amazon RDS for SQL Server, Amazon RedshiftAmazon DynamoDB and Amazon S3. The serverless nature of AWS Glue allows you to focus on your business processes rather than worrying about worrying about underlying infrastructure, provisioning, and configuration. AWS Glue crawls your data sources, identifying various data formats and suggesting schemas and transformations, and also generates code to execute your data transformations and loading processes.

Step 4: Real-time data ingest

 AWS offers many technologies to ingest patient-centric data in a real-time, cost efficient, reliable, and secure manner. For example, you can use AWS Amplify and AWS Mobile SDK to quickly create secure, scalable, full stack mobile applications to suit different device form factors like smart wearables, phones, and tablets. Amplify also helps you create voice-enabled experiences, build AI-powered real-time feeds and launch targeted campaigns. Amazon Pinpoint enables you to connect with patients over channels like email, SMS, push, or voice. You can easily segment your patient population and personalize the messages for each cohort, helping you establish a personalized engagement with your patients.

Amazon Connect provides you with an easy-to-use omni-channel, intelligent cloud contact center, enabling you to quickly set up complex call-routing algorithms and integrate with voice/chat bots, powered by Amazon Lex. Contact Lens for Amazon Connect enables you to use AWS machine learning speech-to-text and natural language processing (NLP) techniques to automatically transcribe contact center calls and surface valuable customer insights embedded in voice. This data can the be used in conjunction with other details to enhance PV analysis. Amazon Alexa, a HIPAA eligible service, is offering several healthcare skills, which can allow voice-enabled interaction with a patient, offering seamless survey integration and answering FAQs about drugs. Alexa Voice Service (AVS) Device SDK is also available on Android, iOS, and several other platforms, enabling developers to easily and quickly build additional skills.

For mining social media, you can leverage the AWS AI-Driven Social Media Dashboard to ingest and monitor social media in near real time. Amazon Chime enables you to provide a HIPAA-eligible, robust telehealth platform to your users, greatly augmenting your virtual touch points with the patient population.

Lastly, the broad and deep family of AWS IoT services allows you to ingest data from a vast variety of sensors in a secure and a scalable way. All these technologies enable you to collect data from sources like Clinical Trials (Phase I-IV), observational post-marketing surveillance studies, patient support programs, call centers, Patient Reported Outcomes (PROs), etc.

Step 5: Data preparation

The construct of creating a data lake suits well to this usecase as we are expecting data from lots of different sources, with different varieties, volumes, and velocities. AWS Lake Formation helps you build a secure data lake in days, allowing you to create a centralized, curated, and secured repository which is easily searchable, extendable, and auditable. This data lake now becomes the basic building block where all the PV data can be stored both in its original form and prepared for analysis, acting as one single source of truth. Lake Formation provides an authorization and governance layer on data stored in Amazon S3. You can use a hierarchy of permissions in Lake Formation to grant or revoke permissions to read data catalog objects such as databases, tables, and columns. Lake Formation simplifies the management of permissions and allows you to implement fine-grained access control (FGAC) for your data. Different business teams can then connect to this datalake and access the data, adhering to the fine grained permissions and data access allocated to them through Lake Formation. Using AWS Glue, Lake Formation also supports crawling a data store, determining the schema for your data, and then creating metadata tables in your AWS Glue Data Catalog. For more information, check out the Getting started with AWS Lake Formation blog.

Step 6: Storage and archival

Lake Formation leverages Amazon S3 to store the data. This provides you with a cost-effective yet scalable storage solution to meet your growing data needs. Amazon S3 stores three copies of your data across an AWS region, providing you with 11 9’s of durability. You can leverage Amazon S3 features like encryption, audit logging, search, storage management, storage monitoring, query in place, etc. to build a rich and extensible solution. In addition to this, its native integration with other AWS compute, artificial intelligence (AI) and machine learning, analysis, data transfer, and networking services make it very to move data in and out of Amazon S3.

Lastly, different Amazon S3 storage classes allow you to pick the right solution for your specific durability, availability, cost, and latency needs. For example, you can use Glacier Deep Archive to store your archived data for audit purposes and still achieve 11 9’s of durability for $0.00099 per GB-Month.

Step 7: Data transformation and processing

You can leverage the rich set of AWS data processing services, including AWS AI and machine learning services to process a wide variety of data formats including  comma separated, tab delimited, json/parquet files, audio, video, images, structured/unstructured text, etc. For example, you can use Amazon Translate to translate all your incoming data from different languages (and geographies) to English as a common denominator for efficient processing. You can also improve the machine translation further by leveraging Amazon Augmented AI (A21) to implement human post-translation reviews, as detailed in the Designing human review workflows with Amazon Translate and Amazon Augmented AI blog post.

If you are running a call center and have access to call recordings, Amazon Transcribe Medical can help efficiently and accurately convert speech to text, unlocking vast amounts of intelligence embedded in nurse-patient conversations. Transcribe Medical, currently supporting US English, can accurately transcribe medical terminologies such as medicine names, procedures, and even conditions or diseases. You can then use Amazon Comprehend Medical to extract relevant medical information from unstructured text (doctors’ notes, clinical trial reports, call transcripts, patient health records etc.), giving you valuable insights. Along with semantic analysis and topic modeling, Comprehend Medical also provides you with relationship extraction for medication, test, treatments, procedures, and medical conditions. This allows you to easily classify incoming adverse events into high-level categories and drastically reduces the time spent by human teams. For more details, check out the blog Performing medical transcription analysis with Amazon Transcribe Medical and Amazon Comprehend Medical.

If you want to extract text embedded in images and scanned PDF documents, Amazon Textract uses machine learning to instantly read and process any type of document, accurately extracting text, forms, tables, and other data without the need for any manual effort or custom code. This allows you to automate manual data entry processes, enabling you to process millions of document pages in hours. A healthcare-related example is highlighted in the blog Automating claims adjudication workflows using Amazon Textract and Amazon Comprehend Medical.

If you are looking to develop your own custom machine learning models, Amazon SageMaker provides you the ability to build, train, and deploy machine learning models quickly, removing the traditional heavy lifting involved with the process of developing high-quality machine learning models. Custom machine learning models are useful in image analysis usecases like detecting a rash in a patient-provided picture or signal detection usecases like identifying a trend/signal in a clinical study, based on data collected from all the different channels mentioned before in this blog post.

Lastly, pharma companies regularly see a need to map adverse events to clinical-validated international medical terminology used by regulatory authorities such as MedDRA. AWS Batch helps to simplify these coding practices by easily creating Docker containers to encapsulate business logic and let AWS Batch orchestrate them on demand. This allows you to focus on data transformation without worrying about managing batch computing software or server clusters. These systems can be designed to be elastic and scalable, allowing you to achieve a healthy match between demand and capacity. The blog post Building Amazon Neptune based MedDRA terminology mapping for Pharmacovigilance and Adverse event reporting details steps to map the identified adverse events to these ontologies and prepare the results before regulatory submissions. By leveraging AWS AI and machine learning services, you can also enrich the incoming data with data from ClinicalTrials.gov and openFDA.gov to as detailed in the blog post Query drug adverse effects and recalls based on natural language using Amazon Comprehend Medical.

Step 8: Data analysis and reporting

With the advent of personalized medicine and therapeutics, there is a growing need for analysis systems to work with semi-structured and unstructured data. AWS offers many options in this category. You can use Amazon Athena to quickly set up a adhoc query and exploration environment without managing any servers. Athena works seamlessly with Lake Formation, enforcing the permission sets defined there in and allows you to access a variety of standard data formats including CSV, JSON, ORC, Apache Parquet, and Avro.  If you are looking to create a petabyte scale data warehouse to support a bigger group of concurrent users, Amazon Redshift provides an easy solution to query structured and unstructured data across data warehouses, operational databases and data lakes using standard SQL. You can also use Amazon EMR to host and manage Apache Spark, Hive, Presto, and other big data frameworks in a secure, reliable, low-cost and elastic manner. Amazon EMR can be used to analyze vast amounts of real-time data from sources like social media and wearables as detailed in AWS solution Real-Time Analytics with Spark Streaming. You can use Amazon QuickSight, a pay-per-session pricing BI service, to deliver insights to your organization. You can easily create pre-marketing reports like Clinical Study, NDA Annual, and IND Annual and post-marketing reports like PSUR (Periodic Safety update report), PADER (Periodic Adverse Drug Experience report), DSUR (Development Safety Update Report), and ASR (Annual Safety Reports) and share it across users, who can then access it from any device. These reports can easily be embedded in your downstream applications, portals, and websites to provide a seamless experience. QuickSight seamlessly integrates with Amazon Redshift, Athena, and Lake Formation as detailed in these AWS blogs about QuickSight. Lastly, SageMaker can be used to create, tune, and host custom machine learning models to drive predictive analytics, signal detection. and processing as detailed here.

Step 9: Event driven tasks/alerts

Once the data has been analyzed in the previous step, the next step is to make it actionable and establish a communication system with your patients. Traditionally, effective personal patient engagement has been a challenge for pharma companies and the exposure to ever-increasing amounts of data from diverse sources of information makes it even more difficult. According to a report from Syneos Health Trend 2019, medical information has doubled every 73 days since 2010. To effectively counter this,  you can use Amazon Pinpoint to create a scalable, flexible, and cost-effective personalized engagement solution to communicate with your patients. Amazon Pinpoint simplifies cohort identification and segmentation, message templatization and personalization, multichannel message delivery (email, SMS, and push notifications) and monitoring across your engagement solution. To further extend the reach, you can also add custom channels, like WhatsApp as detailed in the blog Adding WhatsApp as an Amazon Pinpoint Channel. To enable real time alerts, you can store key real time data in DynamoDB and use a combination of DynamoDB streams and Amazon Lambda to drive real-time alerts to your population. For an example, see the blog Send real-time alerts using Amazon Pinpoint. You can also track user engagement to understand high-performing channels and improve campaign performance, which can ultimately help you gather quality adverse event data, minimizing false positives and false negatives.

Step 10: Regulatory reporting and submission

The final step would be to submit regulatory information to PV databases like FAERS. Starting June 10, 2015, FDA requires that applicants electronically submit all individual case safety reports (ICSRs), ICSR attachments, and periodic safety reports. The Database-to-Database Transmission (“E2B”) option accepts electronic submissions in the XML format, via the FDA Electronic Submissions Gateway (FDA ESG). Using AWS Batch, you can easily create pre-marketing reports like Clinical Study, NDA Annual, IND Annual and post-marketing repots like PSUR (Periodic Safety update report), PADER (Periodic Adverse Drug Experience report), DSUR (Development Safety Update Report), ASR (Annual Safety Reports) etc.

FDA also requires you to digitally sign and encrypt document submissions using digital certificates for reasons like traceability, security, non-repudiation of origin, non-repudiation of receipt, etc. FDA allows the usage of private certificate authorities (CA) which makes it possible to achieve complete control of security policies and procedures, but a private CA also carries the burden of management and cost to set up and maintain the system.

AWS Certificate Manager (ACM)  lets you easily provision, manage, and deploy public and private certificates, removing many of the time consuming and error prone steps to acquire a SSL/TLS certificates. ACM Private Certificate Authority provides you a highly-available private CA service without the upfront investment and ongoing maintenance costs of operating your own private CA. With ACM Private CA, you can create and manage private certificates for your connected resources in one place with a pay-as-you-go, managed, private CA service.

For submission, customers typically use a B2B client to realize the electronic submission process, which can easily be deployed on Amazon EC2. This allows you to use your existing investment in the B2B software but still leverage the pay-as-you-go, highly available, secure and cost-efficient EC2 instances as the underlying infrastructure layer.

Data security, data privacy, data integrity, and compliance considerations

At AWS, customer trust is our top priority. We deliver services to millions of active customers, including enterprises, educational institutions, and government agencies in over 190 countries. To facilitate this, along with the services mentioned earlier, you should also use AWS Identity and Access Management (IAM) service. IAM enables you to maintain segregation of access, fine-grained access control and secure end user mobile and web applications. You can also use AWS Security Token Service (AWS STS) to provide secure, self-expiring, time-boxed, temporary security credentials to third-party administrators and service providers, greatly strengthening your security posture. You can use AWS CloudTrail to log IAM and STS API calls.

With AWS, you can add an additional layer of security to your data at rest in the cloud. AWS provides scalable and efficient encryption features for services like Amazon EBS, Amazon S3, Amazon Redshift, Amazon SNS, AWS Glue, and many more. Flexible key management options, including AWS Key Management Service, enable you to choose whether to have AWS manage the encryption keys or to keep complete control over their keys. In addition, AWS provides APIs for you to integrate encryption and data protection with any of the services that you develop or deploy in an AWS environment.

As a customer, you maintain ownership of your data and select which AWS services can process, store, and host your content. AWS doesn’t access or use customers’ content for any purpose without their consent. AWS never uses customer data to derive information for marketing or advertising. When evaluating the security of a cloud solution, it’s important that you understand and distinguish between the security of the cloud and security in the cloud. The AWS Shared Responsibility Model details this relationship.

To assist you with your compliance efforts, AWS continues to add more services to the various compliance regulations, attestations, certifications, and programs across the world. To decide which services are suitable for you, see the services in scope page.

You can also use various services such as AWS CloudTrail, AWS Config, Amazon GuardDuty, and AWS Key Management Service (AWS KMS) to enhance your compliance and auditing efforts. Find more details in the AWS Compliance Solutions Guide.

Final thoughts

Life science customers have multi-faceted struggles with standard adverse reporting workflows.  There are more potential data sources for adverse events than ever before including EHR data, insurance claims, and even social media.  Life science companies must leverage this information by combining and searching it for event detection.  Besides being costly and time-consuming, a manual reliance on adverse events detection also introduces the opportunity for errors.  One of the key technology advantages customers can harness to address these challenges is cloud technology and its ability to ingest, store, process, and manage information. Machine learning can not only strip out the potential for errors, but can also reduce costs and increase the speed at which events can be detected and reported upon.  This all equates to a scalable platform for more effective adverse detection in order to ensure patient safety.  To understand more about how AWS can help you establish these capabilities within our organization, reach out to your AWS account team.

Patrick Buckner

Patrick Buckner

Patrick has over 20 years of experience in the Life Science industry working with biopharmaceutical and medical device companies across North and South America, Europe and Asia via software organizations and 9+ years with the engineering and consulting subsidiary of Novo Nordisk. Patrick has worked across the value chain including R&D, clinical development, manufacturing and supply chain and has led sales and marketing teams in North America and Europe. Currently, he is the WW Business Development Manager, leading the Life Science industry solution program. Patrick received his B.A. from the University of North Carolina-Chapel Hill and a Machine Learning Professional Certification from the Massachusetts Institute of Technology (MIT).

Deven Atnoor, Ph.D

Deven Atnoor, Ph.D

Deven Atnoor is an Industry Specialist in AWS’ Global Healthcare and Life-Sciences practice. Leveraging his domain knowledge, Deven is building digital transformation solutions for unlocking the power of data using AWS; enabling healthcare and life sciences customers to generate insights from their data assets to fuel innovation in order to deliver better outcomes for patients. Deven received a Ph.D. Chemical Engineering from the University of Cincinnati and a B.S. from the Indian Institute of Technology, Bombay, India.

Mayank Thakkar

Mayank Thakkar

Mayank Thakkar is a Sr. Solutions Architect in the Global Healthcare and Life Sciences team at AWS. He has more than 18 years of experience in varied industries like healthcare, life sciences, insurance, and retail, specializing in building serverless, artificial intelligence, and machine learning-based solutions to solve real-world industry problems. At AWS, he works closely with big pharma companies in the world to build cutting-edge solutions and help them along their cloud journey.