Back

Contrastive Multi-modal Training with Electrocardiography and Natural Language Echocardiography Reports for Zero-shot Prediction of Structural Heart Disease

WONG, W.-C.; LIU, C.; ELIAS, P.; HUGHES, J. W.; LEUNG, C.-Y.; QIAN, X.-Y.; LI, H.-L.; LAU, Y.-M.; TAO, C.-F.; CHOO, A.; YUNG, C.-H.; FONG, C.-H.; CHOI, W.-K.; CHENG, C.-K.; CHENG, L.-L.; LAU, L.-M.; RELWANI, R.; QIN, J.; YU, L.; LUI, H.-W.; CHIU, H.-O. A. C.; TSE, H.-F.; SIU, C.-W.; ARCUCCI, R.; HO, J. W.-K.; WONG, C.-K.

2025-09-18 cardiovascular medicine

10.1101/2025.09.16.25335870 medRxiv

Show abstract

BackgroundMachine learning models for predicting structural heart disease (SHD) from electrocardiography (ECG) traditionally required structured echocardiographic data. The potential of echocardiography (ECHO) natural language reports remains underused. We describe MERL-ECHO, a multimodal model using contrastive language-image pre-training (CLIP) that aligns ECG with ECHO natural language reports for zero-shot SHD prediction. MethodsWe conducted a multi-center retrospective study using paired ECG and ECHO natural language reports from Queen Mary Hospital and Tung Wah Hospital in Hong Kong. MERL-ECHO was trained on 45,016 pairs ECG-ECHO pairs. Performance was evaluated on an internal test set covering 10 SHDs and on an external test set of 5,442 ECGs with ECHO-derived labels for 6 SHDs from Columbia University Irving Medical Center, USA. ResultsThe cohort included 8,192 patients (mean age 73.7{+/-}16.5 years; 55.3% male). In the internal test set, MERL-ECHO achieved an average AUROC of 0.69, with strongest performance for left ventricular dilation (0.78), right ventricular systolic dysfunction (0.71), and tricuspid regurgitation (0.71). In the external test set, the average AUROC was 0.72, with highest performance for left ventricular systolic dysfunction (0.76) and aortic stenosis (0.76). Pre-training improved AUROC by up to 5%, performance scaled with larger datasets, and ResNet18 outperformed ViT-Tiny as ECG encoder by 7%. Saliency analysis revealed interpretable ECG features, including unexpected P-wave changes in aortic stenosis, suggesting novel disease markers. ConclusionsMERL-ECHO leverages ECHO natural language reports for multimodal training with ECG. This CLIP-based model enables accurate zero-shot prediction of SHDs and highlights interpretable ECG features with potential clinical relevance.

Contrastive Multi-modal Training with Electrocardiography and Natural Language Echocardiography Reports for Zero-shot Prediction of Structural Heart Disease

Matching journals