Back

Self-supervised Learning for Chest CT - Training Strategies and Effect on Downstream Applications

Tariq, A.; Patel, B.; Banerjee, I.

2024-02-05 radiology and imaging
10.1101/2024.02.01.24302144 medRxiv
Show abstract

Self-supervised pretraining can reduce the amount of labeled training data needed by pre-learning fundamental visual characteristics of the medical imaging data. In this study, we investigate several self-supervised training strategies for chest computed tomography exams and their effects of downstream applications. we bench-mark five well-known self-supervision strategies (masked image region prediction, next slice prediction, rotation prediction, flip prediction and denoising) on 15M chest CT slices collected from four sites of Mayo Clinic enterprise. These models were evaluated for two downstream tasks on public datasets; pulmonary embolism (PE) detection (classification) and lung nodule segmentation. Image embeddings generated by these models were also evaluated for prediction of patient age, race, and gender to study inherent biases in models understanding of chest CT exams. Use of pretraining weights, especially masked regions prediction based weights, improved performance and reduced computational effort needed for downstream tasks compared to task-specific state-of-the-art (SOTA) models. Performance improvement for PE detection was observed for training dataset sizes as large as [Formula] with maximum gain of 5% over SOTA. Segmentation model initialized with pretraining weights learned twice as fast as randomly initialized model. While gender and age predictors built using self-supervised training weights showed no performance improvement over randomly initialized predictors, the race predictor experienced a 10% performance boost when using self-supervised training weights. We released models and weights under open-source academic license. These models can then be finetuned with limited task-specific annotated data for a variety of downstream imaging tasks thus accelerating research in biomedical imaging informatics.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.1%
10.6%
2
Nature Machine Intelligence
61 papers in training set
Top 0.2%
8.5%
3
Nature Communications
4913 papers in training set
Top 22%
8.5%
4
Medical Image Analysis
33 papers in training set
Top 0.2%
6.5%
5
Journal of Medical Imaging
11 papers in training set
Top 0.1%
6.5%
6
Medical Physics
14 papers in training set
Top 0.1%
4.9%
7
Patterns
70 papers in training set
Top 0.1%
4.9%
50% of probability mass above
8
Scientific Reports
3102 papers in training set
Top 27%
4.4%
9
PLOS ONE
4510 papers in training set
Top 48%
2.1%
10
PLOS Computational Biology
1633 papers in training set
Top 13%
2.1%
11
Neurocomputing
13 papers in training set
Top 0.1%
2.1%
12
NeuroImage
813 papers in training set
Top 4%
1.9%
13
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 32%
1.7%
14
Diagnostics
48 papers in training set
Top 1%
1.7%
15
Human Brain Mapping
295 papers in training set
Top 3%
1.7%
16
Frontiers in Computational Neuroscience
53 papers in training set
Top 1%
1.5%
17
eLife
5422 papers in training set
Top 49%
1.2%
18
Expert Systems with Applications
11 papers in training set
Top 0.2%
1.2%
19
npj Digital Medicine
97 papers in training set
Top 3%
1.2%
20
GigaScience
172 papers in training set
Top 2%
1.1%
21
Biomedical Optics Express
84 papers in training set
Top 0.9%
0.9%
22
Imaging Neuroscience
242 papers in training set
Top 3%
0.9%
23
PLOS Digital Health
91 papers in training set
Top 2%
0.9%
24
eBioMedicine
130 papers in training set
Top 3%
0.9%
25
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
26
Journal of Pathology Informatics
13 papers in training set
Top 0.3%
0.8%
27
Nature Methods
336 papers in training set
Top 6%
0.8%
28
Nature Medicine
117 papers in training set
Top 4%
0.8%
29
European Radiology
14 papers in training set
Top 0.6%
0.8%
30
NeuroImage: Clinical
132 papers in training set
Top 3%
0.8%