AWS Machine Learning Blog

Classifying high-resolution chest x-ray medical images with Amazon SageMaker

Medical image processing is one of the key areas where deep learning is applied to great effect. Typical processing involves classification, detection, and segmentation using various medical image modalities. In this blog post, we outline a method to use the HIPAA Eligible service Amazon SageMaker to train a deep learning model for chest x-ray image classification using the Amazon SageMaker image classification algorithm. We demonstrate how this image classification algorithm can be an effective tool for analyzing high resolution medical images. We’ll use new features of the algorithm, such as multi-label support and mixed-precision training, to show how a chest x-ray image classification model can be trained 33 percent faster using mixed-precision mode compared to using float32 data type on Amazon EC2 P3 instances. We also show how chest x-ray images can be trained on high-resolution images, thus improving performance over low-resolution models.

Classifying high-resolution chest x-ray medical images

Deep-neural-network-based approaches typically work with low resolution images due to memory constraints. A typical deep network used for image classification (ResNet-152) requires significant memory to work even with 256×256 size images. The memory requirement is also dependent on the batch size used for training. However, some diseases only present themselves in small regions in the chest x-ray images and hence are likely to benefit from high resolution image classification.

The chest x-ray image dataset is publicly released by the National Institutes of Health (NIH) [1] and is available at The dataset comprises 112,120 frontal-view x-ray images of 30,805 unique patients. These images can include up to fourteen text-mined disease image labels that are mined from the associated radiological reports using natural language processing. Any of these fourteen labels can be associated with each x-ray, making it a multi-label image classification problem. The labels for the diseases are Atelectasis, Consolidation, Infiltration, Pneumothorax, Edema, Emphysema, Fibrosis, Effusion, Pneumonia, Pleural_thickening, Cardiomegaly, Nodule, Mass, and Hernia.

Deep learning algorithms have recently been applied to chest x-ray image classification [1]. A 50 layer ResNet pre-trained on the ImageNet dataset was used to train a disease classifier using the chest x-ray images. The images were resized to 224×224 resolution and trained using weighted cross-entropy loss in a multi-label setting. Recently, performance improvement was shown, particularly classes such as mass and nodule, when training was done with 448×448 resolution [2].

There are a couple of ways that we can make image classification work at higher resolutions. We can either reduce the batch size to stay within the memory requirements, or we can distribute training across GPU / machines. Reducing batch size can have an impact on accuracy because smaller batch size can lead to noisy gradients leading to local convergence. We use a similar setup to recreate the results first using the new features on the Amazon SageMaker image classification algorithm, and we show how training with full resolution helps to improve performance. We’ll demonstrate how the enhancements to the Amazon SageMaker image classification algorithm can help you train high-resolution images in a multi-label setup faster and without reducing batch size by using multiple machines.

Image Classification with Amazon SageMaker

Amazon SageMaker Image Classification is one of the built-in algorithms in Amazon SageMaker. It implements the ResNet architecture and supports both full training as well as transfer learning. The pre-trained models used for transfer learning are trained using the ImageNet dataset. The algorithm supports multiple depths of the ResNet architecture thereby catering to different datasets. In transfer learning mode, the algorithm replaces the final fully-connected layer with a new fully-connected layer that has the specified number of outputs, initializes the weights, and then fine-tunes the entire network with the new dataset. The algorithm supports single- and multi-GPU setup as well as single- and multi-machine training. In multi-machine training, the algorithm supports both synchronous and asynchronous updates of weights.

Recently, we added several new features to the Amazon SageMaker image classification algorithm. First, we enabled training in mixed-precision mode where the network computation is done in float16 mode for faster training with less memory while maintaining similar accuracy. We also enabled multi-label input support for training multi-label datasets where each image can be classified into multiple classes. In addition, we enable the use of weighted loss update to handle class imbalance.

Preparing data for multi-label image classification

Each original chest x-ray image is 1024×1024 in size. We used 224×224 input resolution to obtain initial results, and then we obtained results on the full resolution. The input dataset is split into training/validation and testing without patient overlap between these sets and the 14 diseases are modeled as 14 binary outputs of a multi-label classifier. Each disease output indicates the presence/absence of the disease and “no disease” is represented as zeros for all labels. We first split the data into three sets: training, validation, and testing, based on patient IDs.

import panda as pd
import numpy as np

trainper = 0.7
valper = 0.1
file_name = 'Data_Entry_2017.csv'

a = pd.read_csv(file_name)
patient_ids = a['Patient ID']
uniq_pids = np.unique(patient_ids)
total_ids = len(uniq_pids)

trainset = int(trainper*total_ids)
valset = trainset+int(valper*total_ids)
testset = trainset+valset

train = uniq_pids[:trainset]
val = uniq_pids[trainset+1:valset]
test = uniq_pids[valset+1:]
print('Number of patient ids: training: %d, validation: %d, testing: %d'%(len(train), len(val), len(test)))

traindata = a.loc[a['Patient ID'].isin(train)]
valdata = a.loc[a['Patient ID'].isin(val)]
testdata = a.loc[a['Patient ID'].isin(test)]

traindata.to_csv('traindata.csv', sep=',', header=False, index=False)
valdata.to_csv('valdata.csv', sep=',', header=False, index=False)
testdata.to_csv('testdata.csv', sep=',', header=False, index=False)

Then, we create the ‘lst’ files for multi-label classification where each disease is mapped to a set of binary labels.

import csv

def gen_set(csvfile, outputfile):
    disease_list = ['Atelectasis', 'Consolidation', 'Infiltration', 'Pneumothorax', 'Edema', 'Emphysema', \
                   'Fibrosis', 'Effusion', 'Pneumonia', 'Pleural_Thickening', 'Cardiomegaly', 'Nodule', 'Mass', \
    alldiseases = {disease:i for i,disease in enumerate(disease_list)}
    with open(outputfile, 'w') as fp:
        with open(csvfile, 'r') as cfile:
            line = csv.reader(cfile, delimiter=',')
            index = 0
            for element in line:
                # the first column is the image filename, while the second
                # column has the list of diseases separated by |
                diseases = element[1].split('|')
                for d in alldiseases:
                    if d in diseases:
                fp.write('images/%s\n' % element[0])
                index += 1
gen_set('traindata.csv', 'chestxraytrain.lst')
gen_set('valdata.csv', 'chestxrayval.lst')
gen_set('testdata.csv', 'chestxraytest.lst')      

Then we create recordio files using for training and validation. We pass the option pack_label so that the recordio file is created as a multi-label file. For more details, please refer to the Amazon SageMaker multi-label image classification notebooks.

python --pack-label chestxraytrain.lst .
python --pack-label chestxrayval.lst .

Image classification results on the chest x-ray dataset

We used the ResNet-50 model and first trained the network with 224×224 input image size. We used data augmentation techniques such as random cropping, and image transformations. Even though a chest x-ray image is different from ImageNet images, using a pre-trained model trained on the ImageNet dataset helps in achieving better classification accuracy. Hence, we used the use_pretrained_model hyperparameter in the Amazon SageMaker image classification algorithm to train the network. Since this is a multi-label classification, we set the multi_label parameter to 1. We resized the chest x-ray images to 256 before training so that the network can crop 224×224 regions from the input image.

The following code snippet shows how it can be done using the Amazon SageMaker Estimator interface and the image classification algorithm.

import sagemaker
from sagemaker import get_execution_role
from import get_image_uri

role = get_execution_role()
sess = sagemaker.Session()
prefix = 'ic-chestxray'

training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest")
s3train = 's3://{}/train/'.format(bucket)
s3validation = 's3://{}/validation/'.format(bucket)
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
multilabel_ic = sagemaker.estimator.Estimator(training_image, role,
                        train_volume_size = 50, train_max_run = 360000,
                        input_mode= 'File', output_path=s3_output_location,
multilabel_ic.set_hyperparameters(num_layers=50, use_pretrained_model=1,
                                        image_shape = "3,224,224", num_classes=14,
                                        resize=256,  epochs=100, 
                                        learning_rate=0.0005, optimizer='adam', 
                                        augmentation_type = 'crop_color_transform',
                                        precision_dtype='float32', multi_label = 1)
train_data = sagemaker.session.s3_input(s3train, distribution='FullyReplicated',
validation_data = sagemaker.session.s3_input(s3validation, distribution='FullyReplicated',
data_channels = {'train': train_data, 'validation': validation_data}, logs=True)

The performance of the network was tested on the test set and the individual AUC along with the average AUC was computed. The results are tabulated in the table that follows. The average AUC obtained was 0.768.

Training with weighted loss

An additional feature introduced in image classification is the use of weighted loss to handle class imbalance. Typically, when training with a multi-label dataset, there might be imbalance between classes. This imbalance can lead to a network leaning towards learning one class over another. To avoid that, the Amazon SageMaker image classification algorithm uses the use_weighted_loss hyperparameter to balance the samples. When this parameter is set to 1, a weight value is calculated for each label based on the number of samples of that label in the training set. First, the number of samples in each class is calculated from the training set and the weight for loss update is set to N/N_l for that class where N is the total number of samples in the training set and N_l is the total number of samples for class l in the training set. This will weigh the loss calculated for gradient update differently for each class based on their weight thereby enabling balanced training. The average AUC increased to 0.814 when trained using the weighted loss feature enabled while still using 224×224 input resolution.

multilabel_ic.set_hyperparameters(num_layers=50, use_pretrained_model=1,
                                        image_shape = "3,224,224", num_classes=14,
                                        mini_batch_size=256, resize=256,  epochs=100, 
                                        learning_rate=0.0005, optimizer='adam', 
                                        num_training_samples=80000, use_weighted_loss=1,
                                        augmentation_type = 'crop_color_transform',
                                        precision_dtype='float32', multi_label = 1)

Training with mixed-precision

The Amazon SageMaker image classification algorithm now supports training in mixed-precision mode. This is controlled by the hyperparameter, precision_dtype, which can be set to ‘float32’ (default) or ‘float16’. In mixed-precision mode, the network computes the backward and forward pass in low-precision (float16) while maintaining the master weights in high-precision (float32). This enables the training to be faster while maintaining similar accuracy. By using the mixed-precision mode, the training time was reduced by 33 percent while obtaining the overall AUC of 0.821, which is similar to the one obtained with float32 training. The training time reduction was even greater when training using two instances for the high-resolution input (see the following section) and increased to 47 percent.

multilabel_ic.set_hyperparameters(num_layers=50, use_pretrained_model=1,
                                        image_shape = "3,224,224", num_classes=14,
                                        mini_batch_size=256, resize=256, epochs=100, 
                                        learning_rate=0.0005, optimizer='adam', 
                                        num_training_samples=80000, use_weighted_loss=1, 
                                        augmentation_type = 'crop_color_transform',
                                        precision_dtype=’float16’, multi_label = 1)

Training with high-resolution input

We then used the original input resolution by setting the image_shape parameter to 896×896. We used the use_weighted_loss feature and float32 precision for this training. We used this resolution because it allows the network to sample a 896×896 region from the 1024×1024 during data augmentation. Since the high resolution will use more memory, typically batch_size is reduced to train the network. However, because Amazon SageMaker image classification supports distributed training, we were able to maintain the batch_size by running the training across multiple instances. This is done by setting the instance_count parameter in the Amazon SageMaker training to 2. The average AUC for this resolution increased to 0.830, particularly for classes such as nodule, which can benefit from high-resolution input. When we trained with mixed_precision set to 1, the average AUC was 0.825. The training was done using the same code as before but setting the train_instance_count = 2, image_shape=”3,896,896” and not setting the resize parameter.

multilabel_ic = sagemaker.estimator.Estimator(training_image, role, train_instance_count=2,
                                                train_volume_size = 50, train_max_run = 360000,
                                                input_mode= 'File', output_path=s3_output_location,

multilabel_ic.set_hyperparameters(num_layers=50, use_pretrained_model=1, 
                                        image_shape = "3,896,896", num_classes=14,
                                        mini_batch_size=64, epochs=100, 
                                        learning_rate=0.00025, optimizer='adam', 
                                        num_training_samples=80000, use_weighted_loss=1, 
                                        augmentation_type = 'crop_color_transform',
                                        precision_dtype='float32', multi_label = 1)
224×224 224×224 with class balancing 224×224 with
mixed precision
Atelectasis 0.772


0.799 0.800
Cardiomegaly 0.859 0.899 0.906 0.884
Effusion 0.830 0.873 0.873 0.873
Infiltration 0.626 0.693 0.691 0.698
Mass 0.791 0.839 0.834 0.821
Nodule 0.716 0.743 0.751 0.817
Pneumonia 0.645 0.710 0.713 0.739
Pneumothorax 0.778 0.836 0.862 0.878
Consolidation 0.695 0.791 0.789 0.785
Edema 0.799 0.849 0.863 0.879
Emphysema 0.850 0.889 0.909 0.933
Fibrosis 0.764 0.791 0.811 0.822
Pleural Thickening 0.726 0.758 0.761 0.785
Hernia 0.903 0.929 0.940 0.911
Average AUC 0.768 0.814 0.821 0.830


In this blog post, we showed how the new features of the Amazon SageMaker image classification algorithm can be used to train the high-resolution chest x-ray dataset. In particular, we showed how the multi_label parameter can be used to train multi-label datasets and how the use_weighted_loss parameter for sample balancing within labels can be shown to improve the overall accuracy. We also showed how the training time can be reduced by 33 percent using mixed-precision while maintaining similar accuracy, and how it can be reduced further by 47 percent when multiple instances were used. In addition, we showed how the algorithm can be used to train using high-resolution images, leading to improved accuracy compared with using low-resolution input.


  1. X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, R. M. Summers. ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases.CVPR 2017
  2. Ivo M. Baltruschat, Hannes Nickisch, Michael Grass, Tobias Knopp, and Axel Saalbach, Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification, arxiv 1803.02315 (


About the authors

Gurumurthy Swaminathan is an Sr. Applied Scientist in the Amazon AI Platforms group and is working on building computer vision algorithms for Sagemaker. His current area of research includes Neural Network compression and Computer Vision algorithms.




Vineet Khare is a Sciences Manager for AWS Deep Learning. He focuses on building Artificial Intelligence and Machine Learning applications for AWS customers using techniques that are at the forefront of research. In his spare time, he enjoys reading, hiking and spending time with his family.