AWS Partner Network (APN) Blog

How to Use Amazon Rekognition and Amazon Comprehend Medical to Get the Most Out of Medical Imaging Data in Research

By Sarah Gabelman, Director of Product Management at Ambra Health

APN Advanced Technology Partner-3
Connect with Ambra Health-1
Rate Ambra Health-1

Medical imaging is a key part of patient health records and clinical trial workflows. These workflows are complex and involve hunting down imaging from an onsite clinical PACS (picture archiving clinical system), requesting imaging be sent from an outside facility, or waiting for imaging to arrive on a compact disc (CD).

Many medical facilities still burn medical imaging on CDs, a time-consuming and error-prone process where patient data must be matched. The staff traditionally assigned in hospitals for this task (often referred to as a film library) are often overwhelmed with enormous stacks of CDs.

This process can take anywhere from a few hours when imaging is onsite, to days or weeks if imaging is mailed or brought by courier service from an outside facility.

Additionally, imaging data on CDs and on-premises archives can create significant risks from lost studies, errors, and unscheduled PACS downtime. Even when an electronic workflow is implemented, there could still be complex challenges around matching data with parent studies, customizing case report forms, integrating with post processing systems, and intaking imaging from outside sites.

In this post, I will discuss key challenges faced by medical facilities and suggest approaches for using Amazon Web Services (AWS) tool sets, as enhanced by Ambra Health, to meet each challenge.

Ambra Health is an AWS Partner Network (APN) Advanced Technology Partner and a medical data and image management cloud software company. We are personally and professionally committed to the mission of delivering better care through better technology—right at the heart of the care network.

This is an article for developers who are working with diagnostic medical imaging or DICOM (digital imaging and communications in medicine) data, either in academic or commercial research settings relating to pharmaceutical development and/or algorithm development for artificial intelligence (AI) or machine learning (ML).


At Ambra Health, we understand these challenges from a unique standpoint. Our company developed a cloud-based image management solution that lets institutions of all sizes securely store, share, and view medical imaging.

Our focus has been DICOM imaging, including X-Rays, CT, ultrasound, and MRI studies. Ambra currently manages more than six billion images, and the company has a growth rate of 40 percent year over year. We include six of the top 10 health systems, and three of the four top children’s hospitals, among our customers.

We turned to AWS to help us scale and improve our ever-growing workflow and use cases.

First, we found that customers had moved some of their storage infrastructure to AWS, so we needed to act as a flexible partner. These customers were based both in the United States and internationally, and we wanted to enable them to run our system in the architecture that was best suited to their cost and operational structure.

Second, our customers were rapidly realizing the imaging data they held could be useful and lead to new insights. We call this process transforming a liability (such as imaging data held for record keeping purposes) into an asset (like imaging that can provide new diagnostic and therapy insights).

To enable these insights, Ambra needed to provide customers with enhanced tool sets around searching for relevant data, and anonymization and de-identification in both metadata as well as pixel-level data in the images themselves.

Searching for Relevant Data

Amazon ElasticSearch Service enables Ambra to quickly index and search through billions of images and studies. Ambra also used Amazon Comprehend Medical and other neuro-linguistic programming (NLP) tools to extract medical information from unstructured reports.

This allowed us to accurately identify studies with specific characteristics, such as diagnosis records (positive or negative) and medical procedure records. As a result, we can help institutions maintain a record of the information they have based on conditions and other search criteria, rather than simply patient identification information (PII).

With this approach, researchers are able to create cohorts of relevant research data based on, for example, lesion size and/or body part location. This can be invaluable as researchers try to find the needles in haystacks of data.

Ambra also provides relevant reporting and summation. This automated procedure replaced manual curation at many institutions that were previously unable to curate data rapidly enough and/or at scale.

Automated features analyze the HL7 message and return the found diagnosis reports and medical procedure reports in under a second. With a manual workflow, it would take many minutes per study for a user to view the HL7 report and parse through the text for the diagnosis and procedure details.

In this video, hear from Morris Panner, CEO at Ambra Health, who shares his thoughts on the value of being an APN Partner and why the industry experience of the AWS teams he’s worked with has been so helpful.

Removing Protected Health Information (PHI)

De-identified DICOM images are an important component of clinical research workflows, but the process of manually de-identifying large amounts of pixel data is both time-consuming and labor intensive for customers.

Ambra Health’s automatic pixel de-identification feature uses Amazon Rekognition and Amazon Comprehend Medical APIs to allow customers to de-identify images more quickly and to reduce user error.

Ambra Health offers two anonymization options using Amazon Rekognition and Amazon Comprehend Medical. The first option masks all text located on the DICOM image. The second masks only text that is recognized as PHI (protected health information).

When the all-text option is enabled for a customer, Ambra converts the DICOM images to JPG format and sends them to Amazon Rekognition. The AWS service then identifies the text on each DICOM and returns the text strings and coordinates found on the images to Ambra Health. We use these coordinates to mask all text on the DICOM images.

When the PHI-text option is enabled, Ambra converts the DICOM images to JPG format and sends them to both Amazon Rekognition and Amazon Comprehend Medical. First, Amazon Rekognition is used to identify the text strings and coordinates on each DICOM. These are sent to Amazon Comprehend Medical and both the text strings and coordinates are passed back to Ambra Health.

Amazon Comprehend Medical processes the text strings provided by Amazon Rekognition, identifies the text strings that contain PHI, and then passes the PHI text strings back to Ambra. We use these coordinates to mask the PHI on the DICOM images.

Ambra also de-identifies other known PHI strings in addition to those identified by Amazon Comprehend Medical.

Ambra Health-Rekognition

Figure 1 – Anonymization options using Amazon Rekognition and Amazon Comprehend Medical.

The diagram above highlights the two anonymization options offered by Ambra Health using Amazon Rekognition and Amazon Comprehend Medical. The first option masks all text located on the DICOM image, while the second masks only text that is recognized as PHI.

Customer Use Case

At one regional academic medical center, the film library staff found themselves bogged down by searching for and downloading imaging data from CDs. They also faced a unique challenge in regards to a trial, where patients were associated with a parent study in the region and subject IDs had to be conserved.

Ambra needed to create a custom workflow to conserve imaging IDs while anonymizing patient information to line up clinical data with patient data. Our engineering team, using AWS tool sets, was able to customize the output of data so it could be stored and reported under specifications set by the statisticians on the team.

The Ambra viewer was also customized to meet the stringent demands of the neuro radiologists reviewing the studies. Today, study upload and anonymization time has been sped up to just minutes. More than 4,000 images have been successfully uploaded into the system and matched.

The reduction in administration time for the team allows them to focus on the studies themselves, leading to greater insights that will enable better patient care across the board.


Medical imaging has traditionally been thought of as a burdensome liability. However, facilities today can use data for exciting new initiatives leading to unparalleled discoveries.

The challenge with imaging data is appropriate utilization and anonymization. Ambra Health sought to provide customers with an enhanced tool set to search for relevant data and anonymize and de-identify imaging.

Today, Ambra’s automatic pixel de-identification feature uses Amazon Rekognition and Amazon Comprehend Medical APIs to allow customers to de-identify images and reduce error. Now, it’s easier than ever to deploy an integrated application fabric that elevates healthcare efficiency and care.

To learn more about how Ambra Health can free data from silos at your organization, visit AWS Marketplace or contact

The content and opinions in this blog are those of the third party author and AWS is not responsible for the content or accuracy of this post.



Ambra Health – APN Partner Spotlight

Ambra Health is an APN Advanced Technology Partner. A leading medical data and image management cloud software company, Ambra is committed to the mission of delivering better care through better technology—right at the heart of the care network.

Contact Ambra Health | Solution Overview | AWS Marketplace

*Already worked with Ambra Health? Rate this Partner

*To review an APN Partner, you must be an AWS customer that has worked with them directly on a project.