CDPHP Modernizes Infrastructure and Improves Ability to Extract Valuable Medical Data on AWS


Capital District Physicians’ Health Plan Inc. (CDPHP), a not-for-profit, physician-founded and guided health plan serving 400,000 members in Upstate New York, strives to provide its customers with quality care that is both affordable and easily accessible. CDPHP ingests vast amounts of electronic medical records every day. But historically, medical records and health data have largely been collected as unstructured data, which makes it difficult for healthcare payors to use it to derive insights and deliver better care to members.

To process medical data more efficiently and improve care, CDPHP turned to the artificial intelligence and machine learning (ML) capabilities of Amazon Web Services (AWS). Using a range of AWS services, including Amazon Comprehend Medical, a HIPAA-eligible natural language processing service that uses ML to extract health data from medical text, the organization has automated its data processing pipeline, improved efficiency, and made its infrastructure more agile so that it can better respond to its members’ needs.


By using Amazon Comprehend Medical, we can normalize information from disparate sources and across different formats into a common format that we can analyze with our ML models.”

Matthew Pietrzykowski
Director of Data Science and Transformational Analytics, Capital District Physicians’ Health Plan Inc.

Modernizing Data Infrastructure and Strategy

Prior to using AWS, CDPHP needed to manually extract, process, and organize all medical records—a labor-intensive process. “We just didn’t have the capability to develop a homegrown solution within a reasonable time frame,” says Matthew Pietrzykowski, director of data science and transformational analytics at CDPHP. The organization wanted to extract insights more effectively to enhance member care. With artificial intelligence and ML technology, it could automate the process to achieve this goal.

As CDPHP looked to modernize its data processing stack in the cloud, it decided to use purpose-built health artificial intelligence services on AWS, such as Amazon Comprehend Medical. It had already been an AWS customer for over 8 years, and CDPHP was excited about the scalability it could unlock using deep learning on AWS. This would simplify automating much of the organization’s data processing work. In 2019, the internal architecture team at CDPHP began to connect each part of the data pipeline on AWS. Early in the design stage, the organization engaged AWS Professional Services, which supplements teams with specialized skills and experience, to create a more efficient architecture for its data processing solution. Its goal: to increase its processing capacity for medical data while remaining modular enough to scale to virtually any use case.

“This transformation was an opportunity to start pushing for more serverless, modular technologies that we could use to innovate faster,” says Christopher Barrantes, senior enterprise architect at CDPHP. The ability to analyze individual text files and audio transcriptions and rate their quality was central. By determining which pieces of information were high quality as they were analyzed, the solution could automatically filter out low-quality results and present the highest-quality data to use for medical care planning.

Improving Efficiency by Automating Data Processing on AWS

“AWS facilitates building a complete solution that’s both useful and easy to use while being modular,” says Pietrzykowski. On AWS, CDPHP has transitioned to an automated, ML-based solution with several key components. First, CDPHP uses Amazon Textract, which automatically extracts printed text, handwriting, and data from any document. CDPHP can even use Amazon Textract to parse handwritten documents for downstream analysis. “Extracting the relevant medical information helps us feed our ML models only what’s useful,” says Pietrzykowski.

CDPHP then uses Amazon Comprehend Medical to understand and extract medical information from the unstructured text in patient medical records, transcripts of audio files, and other sources. CDPHP can take the extracted health information and make it accessible to the entire organization by translating it into a queryable form to feed multiple analytics use cases, such as risk adjustment. “By using Amazon Comprehend Medical, we can normalize information from disparate sources and across different formats into a common format that we can analyze with our ML models,” says Pietrzykowski. “With this solution, we are able to more quickly and efficiently improve the care that our members receive.” One of the most significant benefits of using Amazon Comprehend Medical for CDPHP is that it can assign each data entity an accuracy score and a probability that the data is of good quality. “It’s useful to set thresholds on what data the system should keep and when it should move forward along the pipeline,” says Pietrzykowski. “It’s a valuable first-step filter that helps us make sure we’re only dealing with the most reliable information.”

Next, CDPHP uses Amazon SageMaker—which helps users build, train, and deploy ML models—to use the extracted health information to inform future feature planning and engineering. Modernizing its data infrastructure on AWS has also increased CDPHP’s productivity and development speed. During the initial migration, the organization processed over seven million records to account for all the historical information until it reached a steady-state stream of new records. Using Amazon Comprehend Medical, CDPHP is now processing 3,000 electronic health records weekly, and it plans to double that in 2022.

The Healthcare Effectiveness Data and Information Set (HEDIS) is a performance measurement tool in the healthcare industry that provides consumers with the information to make reliable comparisons of health plan performance, such as experience, availability, and effectiveness of care. Previously, CDPHP’s manual process of generating HEDIS reports was slow and resource intensive. Three data scientists worked almost exclusively on reporting, and a single report took them 4–5 days to generate. Now, CDPHP is producing two reports daily using its automated system. The company has achieved a 60 percent improvement in overall efficiency using Amazon Comprehend Medical, Amazon Textract, and Amazon SageMaker. “Using AWS has been invaluable in time- and cost-efficiency gains,” says Pietrzykowski. “We can respond to stakeholder needs quickly because we’re more agile on AWS.” The increased efficiency frees up CDPHP to focus on designing reliable, innovative solutions for its members’ needs.

Building a Complete Data Strategy on AWS

CDPHP plans to continue improving its ML models on AWS and iterating on its complete data strategy to discover how data can be transformed for use across the organization to derive insights that improve care. It also expects to continue adding new streams of unstructured data to its processing system, expanding the scope of its available data resources. CDPHP knows its foundation on AWS can support its growth.

“We’ve stitched together multiple technologies and services to build this solution,” says Pietrzykowski. “Having everything under one roof on AWS and designed to work with each other makes our job significantly easier.”


CDPHP is a United States–based healthcare plan that strives to deliver quality healthcare at a reasonable cost for its members. Founded by physicians in 1984, CDPHP works with over 10,000 providers and practitioners throughout New York.

Benefits of AWS

  • Improved ability to extract value from health data to derive insights and enhance member care
  • Increased overall efficiency by 60% using Amazon Comprehend Medical
  • Normalizes information from disparate sources and formats into a common format
  • Automated HEDIS reporting process so that it occurs twice daily instead of taking 4–5 days
  • Automatically assigns accuracy scores to incoming health data

AWS Services Used

Amazon Comprehend Medical

Amazon Comprehend Medical is a fully managed HIPAA-eligible natural language processing (NLP) service that uses machine learning (ML) to extract medical data from text – no ML experience is required.

Learn more »

Amazon SageMaker

Amazon SageMaker helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models quickly by bringing together a broad set of capabilities purpose-built for ML.

Learn more »

Amazon Textract

Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables.

Learn more »

AWS Professional Services

The AWS Professional Services organization is a global team of experts that can help you realize your desired business outcomes when using the AWS Cloud.

Learn more »

Get Started

Organizations of all sizes across all industries are transforming their businesses and delivering on their missions every day using AWS. Contact our experts and start your own AWS journey today.