Back

Multimodal AI fuses proteomic and EHR data for rational prioritization of protein biomarkers in diabetic retinopathy

Lin, J. B.; Mataraso, S. J.; Chadha, M.; Velez, G.; Mruthyunjaya, P.; Aghaeepour, N.; Mahajan, V. B.

2026-02-24 ophthalmology
10.64898/2026.02.23.26346903
Show abstract

PurposeThere is a need for novel therapies for diabetic retinopathy (DR) because existing therapies treat only certain features of DR and do not work optimally for all patients. While proteomic studies provide insight into disease pathobiology, they are often limited to small sample sizes due to high costs, limiting their generalizability and reproducibility. Moreover, they often yield lists of tens to hundreds of proteins with differential expression, making it difficult to prioritize the most biologically relevant biomarkers beyond using arbitrary fold-change and false-detection rate cutoffs. Here, we applied a two-stage multimodal AI approach: first, we integrated EHR and proteomics data to rationally prioritize candidate protein biomarkers and, next, validated these biomarkers in an independent cohort. These protein biomarkers of DR are rooted in the EHR data and thereby more likely to be biological drivers of disease. MethodsWe obtained EHR data from a large number of patients with and without DR (N=319,997) from the STARR-OMOP database and obtained aqueous humor liquid biopsies from a subset of these patients (N=101) for high-resolution proteomic profiling. We developed Clinical and Omics Multi-Modal Analysis Enhanced with Transfer Learning (COMET) to perform integrated analysis of proteomics and all available EHR data to identify protein biomarkers of DR. The model was trained in two phases: first, it was pretrained using patients with EHR data alone (N=319,896), and then, it was fine tuned using patients with both EHR and proteomics data (N=101), allowing it to learn both clinical and molecular features associated with DR. Findings from COMET were then validated with liquid biopsies from an independent, validation cohort (N=164). Resultst-distributed stochastic neighbor embedding (t-SNE) analysis of EHR and proteomics data identified proteins clustering with related EHR features. Levels of STX3 and NOTCH2, proteins involved in retinal function, were correlated with a diagnosis of macular edema, a record of a visual field exam, and a prescription for latanoprost, highlighting protein-EHR alignment. The pretrained, multimodal COMET model was superior (AUROC=0.98, AUPRC=0.91) compared to models generated using either EHR or proteomics data alone or without pretraining (AUROC: 0.76 to 0.92; AUPRC: 0.47 to 0.74). The proteins SERPINE1, QPCT, AKR1C2, IL2RB, and SRSF6 were prioritized by the COMET model compared to the models without pretraining, supporting their potential role in DR pathobiology, and were subsequently validated in an independent cohort. ConclusionWe used multimodal AI to prioritize protein biomarkers of DR that are most strongly linked to EHR elements, as well as identifying other protein biomarkers associated with disease features like diabetic macular edema. These findings serve as a foundation for future mechanistic studies and highlight the synergistic value of using multimodal AI to fuse EHR and proteomics data for enhanced proteomics analysis.

Matching journals

1
Ophthalmology Science
Elsevier BV · based on 15 published papers
#1
208× avg
2
Scientific Reports
Springer Science and Business Media LLC · based on 701 published papers
Top 11%
2.3× avg
3
npj Digital Medicine
Springer Science and Business Media LLC · based on 85 published papers
Top 2%
14× avg
4
Investigative Opthalmology & Visual Science
Association for Research in Vision and Ophthalmology (ARVO) · based on 11 published papers
Top 0.3%
116× avg
5
Nature Communications
Springer Science and Business Media LLC · based on 483 published papers
Top 12%
2.7× avg
6
Translational Vision Science & Technology
Association for Research in Vision and Ophthalmology (ARVO) · based on 18 published papers
Top 0.8%
42× avg
7
PLOS ONE
Public Library of Science (PLoS) · based on 1737 published papers
Top 71%
4.6%
8
British Journal of Ophthalmology
BMJ · based on 13 published papers
Top 0.9%
50× avg
9
Eye
Springer Science and Business Media LLC · based on 11 published papers
Top 0.9%
49× avg
10
Communications Biology
Springer Science and Business Media LLC · based on 36 published papers
Top 0.8%
15× avg
11
PLOS Digital Health
Public Library of Science (PLoS) · based on 88 published papers
Top 6%
4.3× avg
12
eLife
eLife Sciences Publications, Ltd · based on 262 published papers
Top 12%
2.3× avg
13
BMC Genomics
Springer Science and Business Media LLC · based on 15 published papers
Top 0.2%
39× avg
14
Human Genetics
Springer Science and Business Media LLC · based on 14 published papers
Top 0.7%
28× avg
15
Computers in Biology and Medicine
Elsevier BV · based on 39 published papers
Top 5%
4.8× avg
16
Journal of Medical Internet Research
JMIR Publications Inc. · based on 81 published papers
Top 14%
0.8%
17
Journal of Allergy and Clinical Immunology
Elsevier BV · based on 15 published papers
Top 2%
9.0× avg
18
Communications Medicine
Springer Science and Business Media LLC · based on 63 published papers
Top 4%
3.9× avg