AWS for Industries

How to build a global, scalable, low-latency, and secure machine learning medical imaging analysis platform on AWS

Introduction

It is hard to imagine the future for medical imaging without machine learning (ML) as its central innovation engine. Countless researchers, developers, start-ups, and larger enterprises are engaged in building, training, and deploying machine learning solutions for medical imaging that are posed to transform today’s medical workflows and the future value of imaging in diagnosis and treatment.

To reach scientific breakthroughs, researchers need first to overcome several obstacles when training and deploying machine learning models. First, they must access large volumes of data stored in disjointed registries that are located in different parts of the world. Second, they need to deploy standardized tools globally to generate ground truth on reference datasets. Finally, they need to configure a secure and cost-effective environment to allow for collaboration between research groups.

That is why Diagnostic Image Analysis Group (DIAG) at the Radboud University Medical Center in Nijmegen, The Netherlands, turned to AWS to migrate their grand-challenge.org open-source platform from their on-premises data center to the cloud. Grand-challenge.org was established in 2012 for the organization of machine learning challenges in biomedical image analysis, and today brings together 45,000+ registered researchers and clinicians from all over the world to collaborate on creating novel ML solutions in the field.

When in early March 2020 it was hypothesized that CT imaging could play an important role in the diagnosis and assessment of COVID-19, the Dutch Radiological Society rapidly proposed a standardized assessment scheme for CT scans of patients with suspected COVID-19 called CO-RADS. And radiologists turned to the grand-challenge.org platform to collect imaging data and to use the platform’s browser-based viewing system for CT scans to assess the performance of the CO-RADS model, which achieved a high discriminatory power for diagnosing COVID-19 from a CT scan alone (ROC 0.91, 95% CI, 0.85-0.97, for positive RT-PCR results)(1)

On the platform, DIAG has made the COVID-19 dataset, the training course to teach radiologists how to assess a scan using CO-RADS, the exam and the ML model available to all registered users. However, grand-challenge.org was running in an on-premises data center, and the experience of the radiologists using the course outside of Europe was poor due to latency of the server-side rendered viewing systems, and the number of scans DIAG could process with our AI tools was limited by the amount of hardware we provisioned before the emergence of SARS-Cov-2.

In April 2020, the collaboration between DIAG and AWS started to bring globally distributed browser-based viewing systems and elastic scaling to make these tools available to machine learning and clinical researchers worldwide. Through a successful collaboration between DIAG’s Research Software Engineering team and AWS the grand challenge platform was able to be migrated to the cloud in less than two weeks from the start of the project. Several technical hurdles were overcome, resulting in a more robust, performant, and scalable application that will continue to support the medical imaging community during this pandemic and beyond.

This work presents the architecture and services used for the global medical imaging analysis platform and explains the challenges, solutions, and results obtained including 1) exchange data with the global research community, 2) low-latency and scalable web-based viewer, 3) secure and cost-effective deployment & distribution of machine learning models, and 4) Rapid migration to cloud of data and compute.

 Exchange data with the global research community

Developing robust machine learning solutions to problems in biomedical imaging requires access to large amounts of annotated training data. The volume of data generated by medical instruments such as MRI and CT scanners, next-generation sequencers, and digital pathology machines steadily increases as sensors become more accurate and systems more sophisticated in characterizing physiology. The massive data generated is locked in siloed databases and proprietary formats. The exchange of data and collaboration on research projects beyond the boundaries of an institution remains a challenge from a technical as well as compliance and security perspective.

On grand-challenge.org DIAG has added functionality so that researchers can set up archives to easily share data with each other, apply algorithms to that data, and set up their own reader studies to invite experts to annotate the data. In medical imaging, shipping HDDs across sites is the norm, but AWS enabled the use of direct upload to Amazon Simple Storage Service (Amazon S3) with accelerated transfers to gather data from sites globally. Users are able to upload data in a variety of medical imaging formats including DICOM and a variety of whole slide image formats. These data are automatically validated and converted to MetaImage or TIFF as this is much easier for the machine learning researchers to work with.

Amazon S3 is ued to store all of the imaging data on the grand-challenge.org platform. Now, DIAG does not need to worry about scaling the storage after the increased data influx from scans of patients with suspected COVID-19. To allow for fast access to the data we use Amazon CloudFront and easily integrate URL signing with the Django backend so that users are only able to download the files for the images that they have permission to view.

Low-latency and scalable web-based viewer

Today, most viewing and processing of medical imaging data in both clinical and research environments happens on-premises on dedicated workstations capable of server-side rendering necessary for routine manipulations such as MIP (maximum intensity projection) viewing or 3D volumetric rendering. With the increased collaboration between radiologists from many institutions spread across the world as well as the rise in secondary usage of medical imaging for research and development of ML solutions, there is a need for globally available solutions.  This very challenge was faced recently by Radboud University Medical Center in Nijmegen as they received great interest from radiologists worldwide for their CO-RADS Academy solution that teaches physicians how to read COVID-19 CT images.

The Diagnostic Image Analysis Group (DIAG) developed a web-based medical imaging viewer called CIRRUS, which is built on MeVisLab from MeVis Medical Solutions. CIRRUS enables the use of many tools that radiologists require for interacting with medical imaging data. Server-side rendering is used for rapid loading of the medical imaging data and allows the use of powerful rendering hardware for 3D multiplanar reformation, pre-loading of series in memory and GPU acceleration. The rendered scenes are streamed to the client over a WebSocket connection to a VueJS single page application to also gain the strengths of client-side interactions where necessary. These workstations are deployed using Docker containers, and one container image is launched per user with the users being routed to their container instance with Traefik.

In this project, DIAG was able to set up rendering servers on AWS in Europe, Japan, and North America on Amazon Elastic Compute Cloud (Amazon EC2). To start the container on-demand for a new user, it takes less than 30sec and we are able to horizontally scale the compute pool by adding additional EC2 instances in each Region. The medical imaging data are stored in an Amazon S3 bucket in Europe. To ensure rapid loading times in North America and the Asia Pacific Regions we used Amazon CloudFront to cache the data on demand. The loading performance for a typical 300MB CT, 500 slices CT studies is less than 10 seconds. With a latency of 20ms, there is no observer delay in scrolling, which provides a great user experience.

Secure and cost-effective deployment & distribution of machine learning models

Researchers need to have the freedom to use whatever tool or library is most appropriate for their use case, and often find it difficult to distribute their models to the rest of the research community. On grand-challenge.org this gap is bridged by allowing researchers to upload their developed model and pre-processing pipeline as a Docker container image, where they can manage the users who can access the algorithm. This allows researchers to share their algorithms with the community, where the platform will handle authentication, authorization, data access, validation and conversion of DICOM to MetaImages, and execution of the containers on the data with GPU acceleration.

The grand-challenge.org platform uses Celery to schedule the jobs for these containers and medical images, providing GPU acceleration where needed with NVIDIA T4 cards. DIAG was able to reduce the number of services it manages by using Amazon Simple Queue Service (Amazon SQS) as the message broker and can now horizontally scale by adding workers that listen to each queue. It is also able to run across its existing provisioned hardware and during periods of increased demand start extra g4dn EC2 instances.

This has enabled researchers to rapidly deploy a model for automated scoring of CT scans with CO-RADS and is available at https://grand-challenge.org/algorithms/corads-ai/. Users are able to upload their own data and receive a prediction on this data within 2 minutes, and then inspect the results in the globally available browser-based workstations.

Image courtesy of Radboud University Medical Center in Nijmegen

Rapid migration to cloud of data and compute

DIAG had previously been running grand-challenge.org in its on-premises data center, and did not foresee the scale at which it would need to operate during this pandemic. However, when developing the application DIAG had the cloud in mind, and tried to ensure that workloads were mobile. The team at DIAG uses the 12 Factor App methodology, has robust CI/CD pipelines that distribute the application as a set of Docker images, provision bare metal and VM instances with Ansible, and used Minio to abstract the on-site storage with the Amazon S3 API.

Using AWS services, DIAG was able to rapidly move this workload to the cloud. Several terabytes of imaging data were synced in place using Amazon S3 sync, so that switching over the storage backend was a case of changing environment variables. The team was also able to move the database to a managed Postgres RDS instance and the Celery broker from Redis to Amazon SQS to reduce the ops burden. Moving this workload to AWS has allowed for scale on-demand, based on the unpredictable demand during this pandemic.

Research results and future work

Recently, the group at Radboud University Medical Center together with numerous collaborators published the results of the CORADS-AI, a system that consists of three deep learning algorithms that automatically segment the five pulmonary lobes, assign a CO-RADS score for the suspicion of COVID-19 and assign a CT severity score for the degree of parenchymal involvement per lobe.(2) The system was  tested on 105 patients (62 ± 16 years, 61 men) and 262 patients (64 ± 16 years, 154 men) internal and the external cohorts, respectively. The system discriminated between COVID-19 positive and negative patients with areas under the ROC curve of 0.95 (95% CI: 0.91-0.98) and 0.88 (95% CI: 0.84-0.93).  CORADS-AI has been deployed on the AWS platform and it is now available for other researchers.

One of the future goals of grand-challenge.org is to allow users to submit their custom algorithms to run in GPU accelerated hardware in the cloud. This code needs to run in an isolated, secure environment, and users should preferably be able to either use Docker images provided by grand-challenge.org, or their own images. A solution is currently being developed that makes use of AWS Batch as a job scheduler. A web application will let the users interact with a fully serverless backend built on top of AWS Lambda, Amazon DynamoDB, and Amazon API Gateway, to enable them to submit their jobs and manage their results.

Amazon ECR will store container images. A CI/CD pipeline built on top of the AWS CodePipeline service will follow, that will allow users to submit their Dockerfile to have them automatically built and stored in Amazon ECR.

Bram van Ginneken

Bram van Ginneken

Bram van Ginneken PhD, is Professor of Medical Image Analysis at Radboud University Medical Center and chairs the Diagnostic Image Analysis Group. He also works for Fraunhofer MEVIS in Bremen, Germany, and is a founder of Thirona, a company that develops software and provides services for medical image analysis. He studied Physics at Eindhoven University of Technology and Utrecht University. In 2001, he obtained his Ph.D. at the Image Sciences Institute on Computer-Aided Diagnosis in Chest Radiography. He has (co-)authored over 200 publications in international journals. He is a member of the Fleischner Society and of the Editorial Board of Medical Image Analysis. He pioneered the concept of challenges in medical image analysis.

Razvan Ionasec

Razvan Ionasec

Razvan Ionasec, PhD, MBA, is the technical leader for healthcare at Amazon Web Services in Europe, Middle East, and Africa. His work focuses on helping healthcare customers solve business problems by leveraging technology. Previously, Razvan was the global head of artificial intelligence products at Siemens Healthineers in charge of AI-Rad Companion, the family of AI-powered and cloud-based digital health solutions for imaging. He holds 30+ patents in AI/ML for medical imaging and has published 70+ international peer-reviewed technical and clinical publications on computer vision, computational modelling, and medical image analysis. Razvan received his PhD in Computer Science from the Technical University Munich and MBA from University of Cambridge, Judge Business School.

James Meakin

James Meakin

James Meakin, PhD, leads the Research Software Engineering team at the RadboudUMC Technology Center for Deep Learning where we develop software to accelerate the translation of imaging research to the clinic. James is also a core team member of NL-RSE to promote RSE careers and practices in the Netherlands. Before joining RadboudUMC, James obtained a DPhil (PhD) in Medical Imaging from the University of Oxford and worked at Philips Healthcare.

Nuñez Pölcher

Nuñez Pölcher

Pablo Nuñez Pölcher, MSc, is a Solutions Architect working for the Public Sector team at Amazon Web Services, based in Madrid. Pablo focuses on helping public sector customers build new, innovative products on AWS in accordance with best practices. Pablo received his M.Sc. in Biological Sciences from Universidad de Buenos Aires.