AWS Public Sector Blog

Singapore Eye Research Institute categorizes retinal diseases using Amazon Rekognition

AWS branded backgroun with text overlay that says "Singapore Eye Research Institute categorizes retinal diseases using Amazon Rekognition"

Introduction

Deep learning in clinical imaging holds the potential to revolutionize diagnostics, rivaling clinical experts’ performance. However, deep learning typically requires substantial technical expertise, putting these powerful artificial intelligence (AI) techniques out of reach for many clinicians. Automated machine learning (AutoML) platforms aim to democratize deep learning, making AI accessible to clinicians with limited technical skills.

A study by the Singapore Eye Research Institute (SERI) evaluated an AutoML platform for classifying retinal diseases from optical coherence tomography (OCT) scans. The study found that Amazon Rekognition, a code-free AutoML service from Amazon Web Services (AWS), showed impeccable diagnostic performance in categorizing various retinal diseases using OCT imaging. However, it is critical to thoroughly benchmark AutoML solutions against conventional deep learning approaches commonly used in research.

To enable this analysis, SERI and AWS trained and evaluated convolutional neural networks (CNNs), the standard deep learning methodology, using architectures like VGG16 and Xception. The study went beyond technical examination by using Keras and TensorFlow frameworks to uncover nuances in tailoring advanced neural network architectures to the vital task of retinal disease classification.

In this post, we detail the steps to use Amazon Rekognition Custom Labels to train a model that categorizes retinal diseases and the process of training and fine-tuning the VGG16 and Xception CNNs. We then compare the evaluation metrics between the Amazon Rekognition and the trained CNNs.

Solution overview

For this study, we used the Retinal OCT Images (optical coherence tomography) dataset to train and validate models for classifying images into four disease categories: normal, choroidal neovascularization (CNV), diabetic macular edema (DME), and Drusen. Figure 1 shows an example of a retinal OCT image of an eye with choroidal neovascularization (CNV), which is a major cause of vision loss.

optical coherence tomography scan of an eye with the condition choroidal neovascularization

Figure 1. A retinal OCT image showing an eye with choroidal neovascularization (CNV).

First, the image data was stored in Amazon Simple Storage Service (Amazon S3). We then used Amazon Rekognition Custom Labels to train a machine learning (ML) model on the uploaded dataset. For detailed information on getting started, refer to the Amazon Rekognition Custom Labels Guide. After the model is trained, Amazon Rekognition provides the model evaluation metrics, including F1, precision, and recall scores.

By training the custom VGG16 and Xception models alongside the Amazon Rekognition AutoML approach, we can thoroughly evaluate the performance of code-free automated ML versus bespoke deep learning for retinal disease classification.

For comparison, we trained custom deep learning models using VGG16 and Xception architectures on the same training and testing data.

Solution walkthrough

In this post, we explain the approach we took to train and fine-tune the models.

Data augmentation and class balancing: A strategic approach to dataset fortification

When fine-tuning CNNs, strategies like data augmentation and class balancing can strengthen the training dataset. Data augmentation involves applying diverse transformations such as rotation, flipping, and scaling to existing images, effectively expanding what the model sees, helping it learn invariances and generalize better. Simultaneously, class balancing ensures the model trains on an equal representation of each output category. This prevents biases and enables more effective learning across all categories, contributing to the robustness of the fine-tuned CNN. Together, these approaches diversify the data and mitigate issues from class imbalance, contributing to a more robustly fine-tuned CNN model.

Data augmentation

To improve the model’s ability to recognize patterns and be resilient to natural variations in real-world medical images, we augmented the data using a Keras tool called ImageDataGenerator. This process creates more training data by applying random transformations like zooming, flipping, and adjusting brightness to the original images. This exposes the model to a wider variety of examples during training, allowing it to better handle diverse real-world clinical images. The goal is to train a more robust and adaptable model that can accurately analyze medical images taken in diverse clinical settings.

from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(
    rescale=1./255,
    zoom_range=(0.73, 0.9),
    horizontal_flip=True,
    rotation_range=10,
    width_shift_range=0.10,
    fill_mode='constant',
    height_shift_range=0.10,   
    brightness_range=(0.55, 0.9),
    validation_split=0.2
)
valid_datagen = ImageDataGenerator(
    rescale= 1./255, 
    validation_split = 0.2
)

Class weight balancing

To make sure the model pays equal attention to rare conditions as well as common ones, class weights were used during training. Without class weights, the model could become biased towards recognizing only the predominant classes. By assigning higher weights to the less common classes, all conditions receive fair consideration, especially the clinically important ones that appear infrequently. This ensures that all retinal conditions, particularly clinically significant minority classes, are considered equitably during model training.

from sklearn.utils import class_weight
class_weights = class_weight.compute_class_weight(
    class_weight='balanced',
    classes=np.unique(train_generator.classes), 
    y=train_generator.classes
)
train_class_weights = dict(enumerate(class_weights))

Model architecture: Leveraging pre-trained CNNs with Keras and TensorFlow

Integrating pre-trained CNN architectures, such as VGG16 and Xception, into the projects using Keras and TensorFlow brings efficiency to your model development. In this section, we will delve into a comprehensive guide on seamlessly importing and customizing these pre-trained models. From the initialization of the models to the strategic addition of layers, compilation, training and fine tuning, this step-by-step exploration is to navigate the process of transforming these robust pre-trained models for the evaluation, and compare the result against the Amazon Rekognition model.

Initializing pretrained models

To initialize the deep learning model, we take advantage of transfer learning techniques by using the Keras and TensorFlow frameworks to import pre-trained CNN architectures VGG16 and Xception. These CNNs were pre-trained on large datasets such as ImageNet, which contains extensive annotated images spanning thousands of classes of retinal conditions.

from tensorflow.keras.applications import Model (VGG16/Xception)
pretrained_model = Model(include_top=False, weights='imagenet')
pretrained_model.trainable = False

Appending custom layers

To customize the model for classifying retinal diseases, we added additional layers to the pre-trained model. The base model that was initialized could recognize objects in general images. By appending specialized layers at the end, the model can transition from extracting basic visual features to identifying specific eye conditions.

from tensorflow.keras import layers, Model
visible = layers.Input(shape=(224,224,3), name='input')  
x = pretrained_model(visible)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
output = layers.Dense(4, activation='softmax', name='output')(x)
model = Model(inputs=visible, outputs=output)

Compilation and training: Orchestrating the learning journey with Keras compilation

The models were compiled using the Adam optimizer and categorical cross-entropy loss function, meticulously configured to navigate the intricate landscape of retinal image classification.

model.compile(optimizer='adam', 
              loss='categorical_crossentropy', 
              metrics=['accuracy'])

Callbacks

Employing a suite of callbacks for the neural network training process using the Keras library. These callbacks aim to enhance model generalization and mitigate overfitting.

from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau
checkpoint = ModelCheckpoint('model.h5', monitor='val_loss', verbose=1, save_best_only=True, mode='auto', save_weights_only=False, period=1)
earlystop = EarlyStopping(monitor='val_loss', min_delta=0.001, patience=5, verbose=1, mode='auto')
reduceLR = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, verbose=1, mode='auto')

callbacks = [checkpoint, earlystop, csvlogger, reduceLR]

Training

The models are trained using Keras to navigate the complex retinal image feature space. Training is guided by class weights and callbacks to optimize navigation and balance.

history = model.fit(
    train_generator, 
    epochs=50,
    validation_data=valid_generator, 
    verbose=1,
    callbacks=callbacks,
    shuffle=True,
    class_weight=train_class_weights
)

Fine-tuning

After initial training, all layers are unfrozen for refinement. VGG16 and Xception undergo 10 more epochs at a lower 1e-5 learning rate and fixed training epochs of 10 for both VGG16 and Xception models, enhancing specificity and performance on the retinal image dataset.

from keras.models import load_model
model = load_model('xception_original_class_model.h5')
model.trainable= True
model.compile(optimizer= keras.optimizers.Adam(1e-5), loss= 'categorical_crossentropy', metrics= ['accuracy'])

Evaluation

Models are thoroughly evaluated on an independent test set of 250 images, ensuring their reliability and robustness in diverse clinical scenarios. The Amazon Rekognition model demonstrated perfect scores when evaluated on the testing datasets with results of F1 = 1.00, precision = 1.00, recall = 1.00 across all retinal pathologies, outperforming the bespoke CNNs (VGG16 and Xception). Figure 2 is a table comparing the accuracy of the Amazon Rekognition model against the Xception and VGG16 CNNs. Across precision, recall, and F1, Amazon Rekognition’s scores were perfect for all four retinal conditions in the study.

table showing the precision, recall, and F1 scores for Amazon Rekognition against CNNs Xception and VGG16

Figure 2. A comparison of the accuracy of Amazon Rekognition against Xception and VGG16 CNNs for precision, recall, and F1 scores.

Next step of the study

While this study demonstrated promising results, real-world medical imaging datasets are often more challenging than clean academic datasets. As the next step, SERI will assess performance on Amazon Rekognition Custom Labels on imbalanced, sparse, and noisy clinical data.

Furthermore, additional experiments are needed to determine the minimal viable dataset size for accurate classification. This analysis will reveal how much training data is required for Amazon Rekognition Custom Labels to match or exceed the performance of deep learning models. These studies will provide practical guidance on deploying AutoML solutions with limited datasets.

Finally, the team is also exploring the possibility of transitioning the Amazon Rekognition model from proof-of-concept to real-world usage. Rigorous model monitoring will also be established to ensure continued safety and efficacy as new data is ingested over time.

Conclusion

This study shows how AutoML is making deep learning more accessible for medical image analysis. SERI also recently published a paper in the Journal of Medical Internet Research about how clinicians can apply AutoML technologies ethically, rigorously, and effectively.

AutoML platforms like Amazon Rekognition provide easy-to-use interfaces that enable clinicians to apply AI to healthcare data. Results validate that code-free AutoML can match or exceed the performance of conventional models requiring AI expertise. By reducing technical barriers, AutoML advances the democratization of AI in healthcare, allowing broader clinician use. Overall, AutoML shows promise in empowering clinicians to harness advanced AI, unlocking the potential of their data to augment expertise and improve patient care.

Learn more about Amazon Rekognition and its use cases.

Kabilan Elangovan

Kabilan Elangovan

Kabilan is an artificial intelligence (AI) research scientist at Singapore Health Services (SingHealth). His work revolves around the development of clinical screening tools using deep learning and the exploration of generative AI applications in healthcare. He is deeply engaged in the utilization of automated machine learning (AutoML) for clinical diagnosis.

Eugene Ng

Eugene Ng

Eugene is a solutions architect at Amazon Web Services (AWS), with a primary focus on the healthcare industry. He enjoys exploring and implementing new technologies, aiming to support AWS customers in their innovation journey within the healthcare domain.