AWS for Industries

Invicro Improves Medical Image Quality Prediction with SageMaker HPO Jobs

Blog guest authored by Brian Avants, and Jacob Hesterman of Invicro, a REALM IDx company

High-quality medical imaging data is a clinical necessity. Yet, there are few medical datasets with quality annotations that can be used to train models which automate an objective quality prediction process. Invicro, a REALM IDx company, worked with the Amazon Machine Learning (ML) Solutions Lab to leverage a natural image dataset with quality scores to improve the performance of an objective quality prediction model on a small high-resolution chest X-ray dataset.

Invicro developed and validated quality-prediction models using convolutional neural networks (CNNs) trained on the Koniq10K natural image dataset using ANTsR and ANTsRNet, a deep learning package for R. The ML Solutions Lab explored the impact of transfer learning for quality prediction from the natural image dataset to medical image data using Amazon SageMaker (SageMaker) Hyperparameter Optimization (HPO) jobs, and the experiments detailed below show that pretraining with natural image data does indeed help the performance.

In this blog post, we will demonstrate how AWS tools allowed for an efficient experimental process to find the best network for quality prediction of X-ray images.

Image Quality Prediction

Due to the expertise required to evaluate medical imaging for clinical utility, there are few large datasets with quality ratings. Fortunately, there is a long history of research on the quality of natural images. Natural image quality datasets (e.g., TID2013, Koniq10K, LIVE) usually consist of an image and a corresponding set of quality ratings from human observers. This is based on how good an image looks in a general, subjective sense. These datasets have been used to train models that successfully predict quality ratings by learning the statistical features of images which correspond with the range of ratings.

Since medical imaging datasets with quality ratings are much less common, Invicro was investigating whether CNNs trained on one of the many natural image quality datasets could be leveraged for transfer learning with a medical imaging dataset.

To evaluate this approach, Invicro started with the Koniq10K natural image quality dataset, which has over 10,000 images at a resolution of 512*384 pixels. Each image has at least nine quality ratings from human viewers. They also developed an in-house chest X-ray dataset, with both quality scores for clinical utility from a radiologist, as well as anatomical region segmentation.

Fig. 1: An example of the Invicro chest X-ray intensity image and corresponding quality map. Invicro’s medical imaging codebase, ANTsR and ANTsRNet, includes a ResNet50 architecture that they used successfully for natural image quality prediction. ML Solutions Lab used Invicro’s X-ray dataset and imaging codebase to explore generalizations from natural image quality CNNs, and for maximizing performance on quality predictions for images from the chest X-ray dataset.

Fig. 1: An example of the Invicro chest X-ray intensity image and corresponding quality map.
Invicro’s medical imaging codebase, ANTsR and ANTsRNet, includes a ResNet50 architecture that they used successfully for natural image quality prediction. ML Solutions Lab used Invicro’s X-ray dataset and imaging codebase to explore generalizations from natural image quality CNNs, and for maximizing performance on quality predictions for images from the chest X-ray dataset.

Hyperparameter Optimization with SageMaker HPO Jobs
In order to evaluate the impact of transfer learning as well as some components of model design, the ML Solutions Lab team used a ResNet50 architecture from ANTsR and trained a number of models with a range of hyperparameters to explore which configuration works best for the model.

The following is a list of the hyperparameters used:

  • Input patch size:
    • The image is subdivided into smaller segments (patches) that act as input to the network. The following patch sizes were tested: 32*32, 64*64, 96*96, 128*128.
  • Intensity normalization:
    • Different strategies for intensity normalization were tested. The focus was on patch level vs. whole image normalization.
  • Input channels:
    • Additional input data could be included as complements to the X-ray image (e.g., anatomical segmentation, x- and y- coordinates of patch in whole image).
  • Loss function:
    • This was the learning objective for the network. Mean squared error vs. mean absolute error was tested.
  • Learning rate:
    • The scaling factor by which the gradients are updated. A range from 1e-5 to 1e-2 was tested.
  • Residual block channel width:
    • Networks with 32, 64, 96, and 128 convolutional channels were evaluated.
  • Initialization:
    • Initializations with random vs. pretrained weights from the Koniq10K dataset were tested.

Given the large hyperparameter space, the ML Solutions Lab team implemented a SageMaker HPO job to find the best-performing network. SageMaker HPO jobs offer an efficient method for finding the best set of hyperparameters through either a grid search or Bayesian optimization. ANTsRNet is written in R and calls TensorFlow for training the ResNet50 architecture. We built a custom R/TensorFlow GPU Docker container, which allowed us to run SageMaker HPO jobs.

Fig. 2: SageMaker HPO takes the parameters, model data, and training data as input to automatically tune the parameters and start training jobs, allowing efficient discovery of the best model.

Fig. 2: SageMaker HPO takes the parameters, model data, and training data as input to automatically tune the parameters and start training jobs, allowing efficient discovery of the best model.

A grid search of hyperparameter space was carried out with several instances in parallel due to the fact that training one architecture to convergence took many hours, and a serial Bayesian approach would have resulted in a longer search. Pretraining with the Koniq10K network provided a benefit to the quality prediction on the X-ray data, with a Spearman correlation of 0.57 for random initialization and a correlation of 0.62 for Koniq10K pretraining (where 1.0 is the best possible correlation).

Fig. 3: Example performance for the best network from random initialization (left) and Koniq10K pretraining (right). Note that the networks were trained and tested on 128*128 patches taken from 512*512 images. Many of these patches contained pixels with the same rating, which leads to the density of samples at particular points (which originally corresponded to 0–5 but are shifted here due to normalization).

Fig. 3: Example performance for the best network from random initialization (left) and Koniq10K pretraining (right). Note that the networks were trained and tested on 128*128 patches taken from 512*512 images. Many of these patches contained pixels with the same rating, which leads to the density of samples at particular points (which originally corresponded to 0–5 but are shifted here due to normalization).

Fig. 4: Image quality prediction performance as a function of patch size and initialization. Note that all of the networks pretrained on Koniq10K outperform the randomly initialized networks.

Fig. 4: Image quality prediction performance as a function of patch size and initialization. Note that all of the networks pretrained on Koniq10K outperform the randomly initialized networks.

Figure 4 shows that patch size during training had a large effect on performance. For smaller patch sizes, the pretraining was more important (correlation = 0.2 with random initialization, correlation = 0.55 with pretraining for 32*32-pixel patches), although pretraining still showed an advantage at the largest patch size available (correlation = 0.56 with random initialization, correlation = 0.62 with pretraining for 128*128-pixel patches).

Additionally, a number of factors did not have a meaningful effect on the quality prediction model, as shown in Figs. 5a and 5b. In terms of intensity normalization, models trained on patch-based and patient-based normalization performed equally well. Varying learning rate or the channel width of residual blocks also did not have much of an effect on model accuracy.

Fig. 5a: HPO performance as a function of including x-y location input channels. Removal of the x-y location channels as well as the anatomical segmentation input channels greatly improves performance.

Fig. 5a: HPO performance as a function of including x-y location input channels. Removal of the x-y location channels as well as the anatomical segmentation input channels greatly improves performance.

Fig. 5b: HPO performance as a function of patient vs. patch normalization. Patch normalization with no segmentation or x-y input channels is slightly better than the patient normalization.

Fig. 5b: HPO performance as a function of patient vs. patch normalization. Patch normalization with no segmentation or x-y input channels is slightly better than the patient normalization.

There is an interesting interaction between the “global” quality score and the “local” quality score. The Koniq10K dataset consists of large images (768*512 pixels) with a single global quality score. The chest X-ray dataset has a quality map with a score assigned to each pixel. Networks are trained in a patch-wise fashion on small segments of images, usually 128*128 pixels in this case (the maximum size available for the chest X-ray data). For training on the Koniq10K images, patches are assigned the global score, which may not always match with the quality of a smaller region. The pixel-wise quality maps from the chest X-ray dataset allow an average patch score to be extracted during training. A different type of training could be carried out with the pixel-wise quality ratings, although generally there is not much variation over a single patch.

Conclusion

Due to the relative lack of large medical image datasets with quality ratings, there is a valid concern that a model trained on only one of these datasets may not perform well enough. The experiments described above demonstrate that retraining with a natural image quality dataset like Koniq10K can improve performance for quality prediction on a medical image dataset.

With SageMaker hyperparameter optimization, it was possible to efficiently explore the high-dimensional space of possible network designs in order to achieve the best performance. With that set of optimal hyperparameters, there is a clear difference between the network performance for random initialization and pretraining with Koniq10K, showing that pretraining does indeed improve performance. For more detail on hyperparameter optimization with SageMaker consider reading the blog: Bring your own hyperparameter optimization algorithm on Amazon SageMaker. For more information on training with imaging data read: Training Machine Learning Models on Multimodal Health Data with Amazon SageMaker.

References
Avants, B. B., Tustison, N., & Song, G. (2009). Advanced normalization tools (ANTS). Insight j, 2(365), 1-35.

Hosu, V., Lin, H., Sziranyi, T., & Saupe, D. (2020). KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment. IEEE Transactions on Image Processing, 29, 4041-4056.

 

Invicro, a REALM IDx Company

Headquartered in Boston, MA, Invicro was founded in 2008 with the mission of improving the role and function of imaging in translational drug discovery and development across all therapeutic areas. Today, Invicro’s multi-disciplinary team provides solutions to pharmaceutical and biotech companies across all stages of the drug development pipeline (Phase 0-IV), all imaging modalities and all therapeutic areas, including neurology, oncology, and systemic and rare diseases. Invicro’s quantitative biomarker services, advanced analytics and AI tools, and clinical operational services are backed by Invicro’s industry-leading software informatics platforms, VivoQuant® and iPACS®, as well as their pioneering IQ-Analytics Platform, which includes AmyloidIQ, TauIQ and DaTIQ.

Invicro is part of REALM IDx, Inc., a healthcare company that is pioneering the field of integrated diagnostics (IDx), a new frontier of advanced clinical science that brings together laboratory medicine, radiology, pathology, and sophisticated artificial intelligence to derive actionable insights that can lead to better medical solutions for patient care.

 

Jacob Hesterman

Dr. Hesterman is Invicro’s Chief Technology Officer and a founding member of Invicro. He has over 15 years of experience in instrumentation development, image processing, and quantification. As CTO, Dr. Hesterman manages a large technical team of engineers, scientists, and mathematicians, comprising Invicro’s software and analytics teams. He has overseen the development of Invicro’s image processing platform supporting translational and clinical analysis in PET, PET, MR, CT, and histology across multiple disease areas. Dr. Hesterman also oversees Invicro’s software platform, including the iPACS data management and VivoQuant image processing applications.

 

Dr. Avants lives in New England and enjoys applying the latest machine learning techniques to solve practical problems in biomedical science.  His favorite research topics include patient-specific prediction studies, the quantification of neurodegenerative disease and the development of reliable tools for studying multiple modalities in multi-site longitudinal data (eg clinical and observational trials).  Dr. Avants contributes to many open-source projects.  Super-resolution and studies of the heart, lungs and other organs are also occasional topics of research.

James Golden

James Golden

James Golden, PhD is an Applied Scientist on the Bedrock team.

Erika Pelaez Coyotl

Erika Pelaez Coyotl

Erika Pelaez Coyotl is a Data Scientist at the Machine Learning Solutions Lab with ample background in biomedical applications. She helps AWS customers to enable Machine Learning powered solutions across diverse industry verticals for their most pressing business challenges.