Google Research changes the game for medical imaging with self-supervised learning


Venture Beat
Ben Dickson
November, 2021


Deep learning shows a lot of promise in health care, especially in medical imaging, where it can be utilized to improve the speed and accuracy of diagnosing patient conditions. 

But it also faces a serious barrier: the shortage of labeled training data.


Deep learning shows a lot of promise in health care, especially in medical imaging, …but it also faces a serious barrier: the shortage of labeled training data.


In medical contexts, training data comes at great costs, which makes it very difficult to use deep learning for many applications.


To overcome this hurdle, scientists have explored several solutions to various degrees of success. 

In a new paper, artificial intelligence researchers at Google suggest a new technique that uses self-supervised learning to train deep learning models for medical imaging. 

Early results show that the technique can reduce the need for annotated data and improve the performance of deep learning models in medical applications.


In a new paper, artificial intelligence researchers at Google suggest a new technique that uses self-supervised learning to train deep learning models for medical imaging.


Supervised pretraining


Convolutional neural networks have proven to be very efficient at computer vision tasks. 

Google is one of several organizations that has been exploring its use in medical imaging

In recent years, the company’s research arm has built several medical imaging models in domains like ophthalmology, dermatology, mammography, and pathology.

“There is a lot of excitement around applying deep learning to health, but it remains challenging because highly accurate and robust DL models are needed in an area like health care,” said Shekoofeh Azizi, AI resident at Google Research and lead author of the self-supervised paper.


One of the key challenges of deep learning is the need for huge amounts of annotated data.

Large neural networks require millions of labeled examples to reach optimal accuracy. In medical settings, data labeling is a complicated and costly endeavor.

“Acquiring these ‘labels’ in medical settings is challenging for a variety of reasons: it can be time-consuming and expensive for clinical experts, and data must meet relevant privacy requirements before being shared,” Azizi said.


For some conditions, examples are scarce, to begin with, and in others, such as breast cancer screening, it may take many years for the clinical outcomes to manifest after a medical image is taken.


For some conditions, examples are scarce, to begin with, and in others, such as breast cancer screening, it may take many years for the clinical outcomes to manifest after a medical image is taken.


Further complicating the data requirements of medical imaging applications are distribution shifts between training data and deployment environments, such as changes in the patient population, disease prevalence or presentation, and the medical technology used for imaging acquisition, Azizi added.


Further complicating the data requirements of medical imaging applications are distribution shifts between training data and deployment environments…and the medical technology used for imaging acquisition, Azizi added.


One popular way to address the shortage of medical data is to use supervised pretraining

In this approach, a convolutional neural network is initially trained on a dataset of labeled images, such as ImageNet. 

This phase tunes the parameters of the model’s layers to the general patterns found in all kinds of images. 

The trained deep learning model can then be fine-tuned on a limited set of labeled examples for the target task.


Several studies have shown supervised pretraining to be helpful in applications such as medical imaging, where labeled data is scarce. However, supervised pretraining also has its limits.


Several studies have shown supervised pretraining to be helpful …however, supervised pretraining also has its limits.

The common paradigm for training medical imaging models is transfer learning, where models are first pretrained using supervised learning on ImageNet. 

However, there is a large domain shift between natural images in ImageNet and medical images, and previous research has shown such supervised pretraining on ImageNet may not be optimal for developing medical imaging models,” Azizi said.


Self-supervised pretraining


Self-supervised learning has emerged as a promising area of research in recent years. 

In self-supervised learning, the deep learning models learn the representations of the training data without the need for labels. 

If done right, self-supervised learning can be of great advantage in domains where labeled data is scarce and unlabeled data is abundant.


If done right, self-supervised learning can be of great advantage in domains where labeled data is scarce and unlabeled data is abundant.


Outside of medical settings, Google has developed several self-supervised learning techniques to train neural networks for computer vision tasks. 

Among them is the Simple Framework for Contrastive Learning ( SimCLR), which was presented at the ICML 2020 conference. 

Contrastive learning uses different crops and variations of the same image to train a neural network until it learns representations that are robust to changes.


In their new work, the Google Research team used a variation of the SimCLR framework called Multi-Instance Contrastive Learning (MICLe), which learns stronger representations by using multiple images of the same condition. This is often the case in medical datasets, where there are multiple images of the same patient, though the images might not be annotated for supervised learning.

“Unlabeled data is often available in large quantities in various medical domains. One important difference is that we utilize multiple views of the underlying pathology commonly present in medical imaging datasets to construct image pairs for contrastive self-supervised learning,” Azizi said.


In their new work, the Google Research team used a variation of the SimCLR framework called Multi-Instance Contrastive Learning (MICLe), which learns stronger representations by using multiple images of the same condition.


When a self-supervised deep learning model is trained on different viewing angles of the same target, it learns more representations that are more robust to changes in viewpoint, imaging conditions, and other factors that might negatively affect its performance.


Putting it all together


The self-supervised learning framework the Google researchers used involved three steps. 

First, the target neural network was trained on examples from the ImageNet dataset using SimCLR. 

Next, the model was further trained using MICLe on a medical dataset that has multiple images for each patient. 

Finally, the model is fine-tuned on a limited dataset of labeled images for the target application.


The researchers tested the framework on two dermatology and chest x-ray interpretation tasks. 

When compared to supervised pretraining, the self-supervised method provides a significant improvement in the accuracy, label efficiency, and out-of-distribution generalization of medical imaging models, which is especially important for clinical applications. 

Plus, it requires much less labeled data.


Using self-supervised learning, we show that we can significantly reduce the need for expensive annotated data to build medical image classification models,” Azizi said. In particular, on the dermatology task, they were able to train the neural networks to match the baseline model performance while using only a fifth of the annotated data.


The researchers tested the framework on two dermatology and chest x-ray interpretation tasks. … the self-supervised method provides a significant improvement in the accuracy, label efficiency, and out-of-distribution generalization of medical imaging models,… plus, it requires much less labeled data.

This hopefully translates to significant cost and time savings for developing medical AI models. We hope this method will inspire explorations in new health care applications where acquiring annotated data has been challenging,” Azizi said.


About the author

Ben Dickson is a software engineer and the founder of TechTalks. He writes about technology, business, and politics. This story originally appeared on Bdtechtalks.com. Copyright 2021

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More


Originally published at https://venturebeat.com on November 11, 2021.


Names mentioned



ORIGINAL PUBLICATION 




Big Self-Supervised Models Advance Medical Image Classification


Arxiv
1 Apr 2021

Shekoofeh Azizi, Basil Mustafa, Fiona Ryan, Zachary Beaver, Jan Freyberg, Jonathan Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, Mohammad Norouzi


Google Research and Health†


Abstract

Self-supervised pretraining followed by supervised fine-tuning has seen success in image recognition, especially when labeled examples are scarce, but has received limited attention in medical image analysis. 

This paper studies the effectiveness of self-supervised learning as a pretraining strategy for medical image classification. 

We conduct experiments on two distinct tasks: dermatology skin condition classification from digital camera images and multi-label chest X-ray classification, and demonstrate that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images significantly improves the accuracy of medical image classifiers. 

We introduce a novel Multi-Instance Contrastive Learning (MICLe) method that uses multiple images of the underlying pathology per patient case, when available, to construct more informative positive pairs for self-supervised learning. 

Combining our contributions, we achieve an improvement of 6.7% in top-1 accuracy and an improvement of 1.1% in mean AUC on dermatology and chest X-ray classification respectively, outperforming strong supervised baselines pretrained on ImageNet. 

In addition, we show that big self-supervised models are robust to distribution shift and can learn efficiently with a small number of labeled medical images.

Originally published at https://arxiv.org/abs/2101.05224



FROM THE GOOGLE BLOG



Self-Supervised Learning Advances Medical Image Classification

Wednesday, October 13, 2021
Posted by Shekoofeh Azizi, AI Resident, Google Research


In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology. 

Despite the interest, it remains challenging to develop medical imaging models, because high-quality labeled data is often scarce due to the time-consuming effort needed to annotate medical images. 

Given this, transfer learning is a popular paradigm for building medical imaging models. 

With this approach, a model is first pre-trained using supervised learning on a large labeled dataset (like ImageNet) and then the learned generic representation is fine-tuned on in-domain medical data.


Other more recent approaches that have proven successful in natural image recognition tasks, especially when labeled examples are scarce, use self-supervised contrastive pre-training, followed by supervised fine-tuning (e.g., SimCLR and MoCo). 

In pre-training with contrastive learning, generic representations are learned by simultaneously maximizing agreement between differently transformed views of the same image and minimizing agreement between transformed views of different images. 

Despite their successes, these contrastive learning methods have received limited attention in medical image analysis and their efficacy is yet to be explored.


In “Big Self-Supervised Models Advance Medical Image Classification, to appear at the International Conference on Computer Vision (ICCV 2021), we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification. 

We also propose Multi-Instance Contrastive Learning (MICLe), a novel approach that generalizes contrastive learning to leverage special characteristics of medical image datasets. 

We conduct experiments on two distinct medical image classification tasks: dermatology condition classification from digital camera images (27 categories) and multilabel chest X-ray classification (5 categories). 

We observe that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images, significantly improves the accuracy of medical image classifiers. 

Specifically, we demonstrate that self-supervised pre-training outperforms supervised pre-training, even when the full ImageNet dataset (14M images and 21.8K classes) is used for supervised pre-training.


SimCLR and Multi Instance Contrastive Learning (MICLe)

Our approach consists of three steps: (1) self-supervised pre-training on unlabeled natural images (using SimCLR); (2) further self-supervised pre-training using unlabeled medical data (using either SimCLR or MICLe); followed by (3) task-specific supervised fine-tuning using labeled medical data.


Our approach comprises three steps: (1) Self-supervised pre-training on unlabeled ImageNet using SimCLR (2) Additional self-supervised pre-training using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) strategy is used to construct more informative positive pairs based on different images. (3) Supervised fine-tuning on labeled medical images. Note that unlike step (1), steps (2) and (3) are task and dataset specific.


After the initial pre-training with SimCLR on unlabeled natural images is complete, we train the model to capture the special characteristics of medical image datasets. This, too, can be done with SimCLR, but this method constructs positive pairs only through augmentation and does not readily leverage patients’ meta data for positive pair construction. Alternatively, we use MICLe, which uses multiple images of the underlying pathology for each patient case, when available, to construct more informative positive pairs for self-supervised learning. Such multi-instance data is often available in medical imaging datasets — e.g., frontal and lateral views of mammograms, retinal fundus images from each eye, etc.


Given multiple images of a given patient case, MICLe constructs a positive pair for self-supervised contrastive learning by drawing two crops from two distinct images from the same patient case. Such images may be taken from different viewing angles and show different body parts with the same underlying pathology. This presents a great opportunity for self-supervised learning algorithms to learn representations that are robust to changes of viewpoint, imaging conditions, and other confounding factors in a direct way. MICLe does not require class label information and only relies on different images of an underlying pathology, the type of which may be unknown.


MICLe generalizes contrastive learning to leverage special characteristics of medical image datasets (patient metadata) to create realistic augmentations, yielding further performance boost of image classifiers.


Combining these self-supervised learning strategies, we show that even in a highly competitive production setting we can achieve a sizable gain of 6.7% in top-1 accuracy on dermatology skin condition classification and an improvement of 1.1% in mean AUC on chest X-ray classification, outperforming strong supervised baselines pre-trained on ImageNet (the prevailing protocol for training medical image analysis models). In addition, we show that self-supervised models are robust to distribution shift and can learn efficiently with only a small number of labeled medical images.


Comparison of Supervised and Self-Supervised Pre-training


Despite its simplicity, we observe that pre-training with MICLe consistently improves the performance of dermatology classification over the original method of pre-training with SimCLR under different pre-training dataset and base network architecture choices. Using MICLe for pre-training, translates to (1.18 ± 0.09)% increase in top-1 accuracy for dermatology classification over using SimCLR. The results demonstrate the benefit accrued from utilizing additional metadata or domain knowledge to construct more semantically meaningful augmentations for contrastive pre-training. In addition, our results suggest that wider and deeper models yield greater performance gains, with ResNet-152 (2x width) models often outperforming ResNet-50 (1x width) models or smaller counterparts.


Comparison of supervised and self-supervised pre-training, followed by supervised fine-tuning using two architectures on dermatology and chest X-ray classification. Self-supervised learning utilizes unlabeled domain-specific medical images and significantly outperforms supervised ImageNet pre-training.


Improved Generalization with Self-Supervised Models


For each task we perform pretraining and fine-tuning using the in-domain unlabeled and labeled data respectively. We also use another dataset obtained in a different clinical setting as a shifted dataset to further evaluate the robustness of our method to out-of-domain data. For the chest X-ray task, we note that self-supervised pre-training with either ImageNet or CheXpert data improves generalization, but stacking them both yields further gains. As expected, we also note that when only using ImageNet for self-supervised pre-training, the model performs worse compared to using only in-domain data for pre-training.


To test the performance under distribution shift, for each task, we held out additional labeled datasets for testing that were collected under different clinical settings. We find that the performance improvement in the distribution-shifted dataset (ChestX-ray14) by using self-supervised pre-training (both using ImageNet and CheXpert data) is more pronounced than the original improvement on the CheXpert dataset. This is a valuable finding, as generalization under distribution shift is of paramount importance to clinical applications. On the dermatology task, we observe similar trends for a separate shifted dataset that was collected in skin cancer clinics and had a higher prevalence of malignant conditions. This demonstrates that the robustness of the self-supervised representations to distribution shifts is consistent across tasks.


Evaluation of models on distribution-shifted datasets for the chest-xray interpretation task. We use the model trained on in-domain data to make predictions on an additional shifted dataset without any further fine-tuning (zero-shot transfer learning). We observe that self-supervised pre-training leads to better representations that are more robust to distribution shifts.


Evaluation of models on distribution-shifted datasets for the dermatology task. Our results generally suggest that self-supervised pre-trained models can generalize better to distribution shifts with MICLe pre-training leading to the most gains.


Improved Label Efficiency

We further investigate the label-efficiency of the self-supervised models for medical image classification by fine-tuning the models on different fractions of labeled training data. We use label fractions ranging from 10% to 90% for both Derm and CheXpert training datasets and examine how the performance varies using the different available label fractions for the dermatology task. First, we observe that pre-training using self-supervised models can compensate for low label efficiency for medical image classification, and across the sampled label fractions, self-supervised models consistently outperform the supervised baseline. These results also suggest that MICLe yields proportionally higher gains when fine-tuning with fewer labeled examples. In fact, MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x) and 30% of the training data for ResNet152 (2x).


Top-1 accuracy for dermatology condition classification for MICLe, SimCLR, and supervised models under different unlabeled pre-training datasets and varied sizes of label fractions. MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x).


Conclusion

Supervised pre-training on natural image datasets is commonly used to improve medical image classification. We investigate an alternative strategy based on self-supervised pre-training on unlabeled natural and medical images and find that it can significantly improve upon supervised pre-training, the standard paradigm for training medical image analysis models. This approach can lead to models that are more accurate and label efficient and are robust to distribution shifts. In addition, our proposed Multi-Instance Contrastive Learning method (MICLe) enables the use of additional metadata to create realistic augmentations, yielding further performance boost of image classifiers.


Self-supervised pre-training is much more scalable than supervised pre-training because class label annotation is not required. We hope this paper will help popularize the use of self-supervised approaches in medical image analysis yielding label efficient and robust models suited for clinical deployment at scale in the real world.


Acknowledgements

This work involved collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors across Google Health and Google Brain. 

We thank our co-authors: Basil Mustafa, Fiona Ryan, Zach Beaver, Jan Freyberg, Jon Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, and Mohammad Norouzi. We also thank Yuan Liu from Google Health for valuable feedback and our partners for access to the datasets used in the research.


originally published at https://ai.googleblog.com

Total
0
Shares
Deixe um comentário

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Related Posts

Subscribe

PortugueseSpanishEnglish
Total
0
Share