Back

Foundation models for generalizable electrocardiogram interpretation: comparison of supervised and self-supervised electrocardiogram foundation models

Nolin-Lapalme, A.; Sowa, A.; Delfrate, J.; Tastet, O.; Corbin, D.; Kulbay, M.; Ozdemir, D.; Noel, M.-J.; Marois-Blanchet, F.-C.; Harvey, F.; Sharma, S.; Ansari, M.; Chiu, I.-M.; Dsouza, V.; Friedman, S. F.; Potter, B.; Chasse, M.; Afilalo, J.; Elias, P. A.; Jabbour, G.; Bahani, M.; Dube, M.-P.; Boyle, P. M.; Chatterjee, N. A.; Barrios, J.; Tison, G. H.; Ouyang, D.; Maddah, M.; Khurshid, S.; Cadrin-Tourigny, J.; Tadros, R.; Hussin, J.; Avram, R.

2025-03-05 cardiovascular medicine
10.1101/2025.03.02.25322575 medRxiv
Show abstract

BackgroundThe 12-lead electrocardiogram (ECG) remains a cornerstone of cardiac diagnostics, yet existing artificial intelligence (AI) solutions for automated interpretation often lack generalizability, remain closed-source, and are primarily trained using supervised learning, limiting their adaptability across diverse clinical settings. To address these challenges, we developed and compared two open-source foundational ECG models: DeepECG-SSL, a self-supervised learning model, and DeepECG-SL, a supervised learning model. MethodsBoth models were trained on over 1 million ECGs using a standardized preprocessing pipeline and automated free-text extraction from ECG reports to predict 77 cardiac conditions. DeepECG-SSL was pretrained using self-supervised contrastive learning and masked lead modeling. The models were evaluated on six multilingual private healthcare systems and four public datasets for ECG interpretation across 77 diagnostic categories. Fairness analyses assessed disparities in performance across age and sex groups, while also investigating fairness and resource utilization. ResultsDeepECG-SSL achieved AUROCs of 0.990 (95%CI 0.990, 0.990) on internal dataset, 0.981 (95%CI 0.981, 0.981) on external public datasets, and 0.983 (95%CI 0.983, 0.983) on external private datasets, while DeepECG-SL demonstrated AUROCs of 0.992 (95%CI 0.992, 0.992), 0.980 (95%CI 0.980, 0.980) and 0.983 (95%CI 0.983, 0.983) respectively. Fairness analyses revealed minimal disparities (true positive rate & false positive rate difference<0.010) across age and sex groups. Digital biomarker prediction (Long QT syndrome (LQTS) classification, 5-year atrial fibrillation prediction and left ventricular ejection fraction (LVEF) classification) with limited labeled data, DeepECG-SSL outperformed DeepECG-SL in predicting 5-year atrial fibrillation risk (N=132,050; AUROC 0.742 vs. 0.720; {Delta}=0.022; P<0.001), identifying reduced LVEF [&le;]40% (N=25,252; 0.928 vs. 0.900; {Delta}=0.028; P<0.001), and classifying LQTS syndrome subtypes (N=127; 0.931 vs. 0.853; {Delta}=0.078; P=0.026). ConclusionBy releasing model weights, preprocessing tools, and validation code, we aim to support robust, data-efficient AI diagnostics across diverse clinical environments. This study establishes self-supervised learning as a promising paradigm for ECG analysis, particularly in settings with limited annotated data, enhancing accessibility, generalizability, and fairness in AI-driven cardiac diagnostics. Key QuestionCan self-supervised (SSL) learning yield ECG-based AI foundational models with enhanced performance, fairness, privacy, and generalizability compared to traditional supervised learning (SL) approaches? Key FindingOur evaluation of DeepECG-SL and DeepECG-SSL across seven external health center datasets and four international publicly accessible datasets demonstrated that while both models achieve comparable diagnostic accuracy for ECG interpretation, SSL outperforms SL on novel tasks with smaller datasets. Take-home MessageWe validated DeepECG-SL and DeepECG-SSL across public and private datasets and demonstrated that SSL model had a superior generalizability by addressing fairness, privacy, and efficiency, and open sourcing our models, we advance ethical, adaptable AI for equitable, real-world ECG diagnostics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=123 SRC="FIGDIR/small/25322575v1_ufig1.gif" ALT="Figure 1"> View larger version (39K): org.highwire.dtl.DTLVardef@bcab9dorg.highwire.dtl.DTLVardef@a6f3acorg.highwire.dtl.DTLVardef@c7c156org.highwire.dtl.DTLVardef@66175a_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract:C_FLOATNO DeepECG-SL and DeepECG-SSL, two open-source AI models for 12-lead ECG interpretation, were trained on over 1 million ECGs. DeepECG-SSL, utilizing self-supervised contrastive learning and masked lead modeling, outperformed DeepECG-SL in utilizing digital biomarkers to predict atrial fibrillation risk, reduced LVEF, and long QT syndrome subtypes, while both models achieved high diagnostic accuracy with minimal fairness disparities across age and sex. Validated on ten external datasets, our work provides a robust, reproducible framework for equitable, efficient ECG-based cardiac diagnostics. C_FIG

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
34.7%
2
European Heart Journal - Digital Health
15 papers in training set
Top 0.1%
23.7%
50% of probability mass above
3
Circulation
66 papers in training set
Top 0.6%
6.6%
4
JACC: Clinical Electrophysiology
11 papers in training set
Top 0.1%
2.7%
5
Scientific Reports
3102 papers in training set
Top 46%
2.6%
6
Journal of the American Medical Informatics Association
61 papers in training set
Top 1.0%
2.2%
7
Physiological Measurement
12 papers in training set
Top 0.2%
2.2%
8
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.7%
1.7%
9
PLOS Digital Health
91 papers in training set
Top 2%
1.4%
10
PLOS ONE
4510 papers in training set
Top 59%
1.3%
11
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
12
Nature Medicine
117 papers in training set
Top 4%
0.9%
13
Medical Image Analysis
33 papers in training set
Top 0.8%
0.9%
14
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
15
Nature Communications
4913 papers in training set
Top 60%
0.8%
16
iScience
1063 papers in training set
Top 28%
0.8%
17
Journal of the American Heart Association
119 papers in training set
Top 4%
0.8%
18
BMC Medicine
163 papers in training set
Top 6%
0.8%
19
Frontiers in Physiology
93 papers in training set
Top 6%
0.7%
20
Biology Methods and Protocols
53 papers in training set
Top 3%
0.5%
21
Patterns
70 papers in training set
Top 3%
0.5%
22
EBioMedicine
39 papers in training set
Top 2%
0.5%
23
JMIR Medical Informatics
17 papers in training set
Top 2%
0.5%