Back

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Jayavelu, N. D.; Samaha, H.; Wimalasena, S. T.; Hoch, A.; Gygi, J. P.; Gabernet, G.; Ozonoff, A.; Liu, S.; Milliren, C. E.; Levy, O.; Baden, L. R.; Melamed, E.; Ehrlich, L. I. R.; McComsey, G. A.; Sekaly, R. P.; Cairns, C. B.; Haddad, E. K.; Schaenman, J.; Shaw, A. C.; Hafler, D. A.; Montgomery, R. R.; Corry, D. B.; Kheradmand, F.; Atkinson, M. A.; Brakenridge, S. C.; Higuita, N. I. A.; Metcalf, J. P.; Hough, C. L.; Messer, W. B.; Pulendran, B.; Nadeau, K. C.; Davis, M. M.; Geng, L. N.; Sesma, A. F.; Simon, V.; Krammer, F.; Kraft, M.; Bime, C.; Calfee, C. S.; Erle, D. J.; Langelier, C. R.; IMP

2025-02-13 health informatics
10.1101/2025.02.12.25322164 medRxiv
Show abstract

The post-acute sequelae of SARS-CoV-2 (PASC), also known as long COVID, remain a significant health issue that is incompletely understood. Predicting which acutely infected individuals will go on to develop long COVID is challenging due to the lack of established biomarkers, clear disease mechanisms, or well-defined sub-phenotypes. Machine learning (ML) models offer the potential to address this by leveraging clinical data to enhance diagnostic precision. We utilized clinical data, including antibody titers and viral load measurements collected at the time of hospital admission, to predict the likelihood of acute COVID-19 progressing to long COVID. Our machine learning models achieved median AUROC values ranging from 0.64 to 0.66 and AUPRC values between 0.51 and 0.54, demonstrating their predictive capabilities. Feature importance analysis revealed that low antibody titers and high viral loads at hospital admission were the strongest predictors of long COVID outcomes. Comorbidities, including chronic respiratory, cardiac, and neurologic diseases, as well as female sex, were also identified as significant risk factors for long COVID. Our findings suggest that ML models have the potential to identify patients at risk for developing long COVID based on baseline clinical characteristics. These models can help guide early interventions, improving patient outcomes and mitigating the long-term public health impacts of SARS-CoV-2.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Patterns
70 papers in training set
Top 0.1%
18.3%
2
Scientific Reports
3102 papers in training set
Top 3%
14.1%
3
npj Digital Medicine
97 papers in training set
Top 0.6%
8.3%
4
Journal of Medical Internet Research
85 papers in training set
Top 0.6%
7.1%
5
Cell Reports Medicine
140 papers in training set
Top 2%
3.5%
50% of probability mass above
6
International Journal of Medical Informatics
25 papers in training set
Top 0.5%
3.2%
7
PLOS ONE
4510 papers in training set
Top 46%
2.4%
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.3%
9
eBioMedicine
130 papers in training set
Top 0.8%
2.0%
10
Nature Communications
4913 papers in training set
Top 47%
2.0%
11
Viruses
318 papers in training set
Top 2%
1.9%
12
Frontiers in Public Health
140 papers in training set
Top 5%
1.7%
13
Frontiers in Medicine
113 papers in training set
Top 4%
1.7%
14
Communications Medicine
85 papers in training set
Top 0.3%
1.6%
15
Frontiers in Immunology
586 papers in training set
Top 5%
1.5%
16
PLOS Digital Health
91 papers in training set
Top 2%
1.5%
17
JMIR Medical Informatics
17 papers in training set
Top 0.9%
1.5%
18
iScience
1063 papers in training set
Top 22%
1.2%
19
Communications Biology
886 papers in training set
Top 16%
1.1%
20
JMIR Public Health and Surveillance
45 papers in training set
Top 3%
0.9%
21
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
22
The Lancet Digital Health
25 papers in training set
Top 0.8%
0.9%
23
Med
38 papers in training set
Top 0.6%
0.9%
24
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
25
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
0.8%
26
Briefings in Bioinformatics
326 papers in training set
Top 7%
0.7%
27
Advanced Science
249 papers in training set
Top 20%
0.7%
28
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.7%
29
Clinical Chemistry
22 papers in training set
Top 0.9%
0.7%
30
Computational and Structural Biotechnology Journal
216 papers in training set
Top 10%
0.7%