AI-ECG for LVSD detection: a systematic review and first-in-kind multinational head-to-head comparison
Croon, P. M.; Boonstra, M. J.; Allaart, C. P.; Arends, B. K. O.; Dhingra, L. S.; Huang, Y.-C.; Mast, T.; Khera, R.; Kuo, C.-F.; Kwon, J.-M.; Lee, H. S.; Lee, M. S.; van de Leur, R.; Liu, Z.-Y.; Oikonomou, E. K.; Selder, J. L.; Winter, M. M.; Asselbergs, F. W.
Show abstract
BackgroundSeveral artificial intelligence-enhanced electrocardiogram (AI-ECG) models have shown promise in detecting left ventricular systolic dysfunction (LVSD), but their head-to-head agreement and performance have not been independently compared within the same cohort. ObjectivesTo compare the performance of published AI-ECG models for LVSD detection in a standardized external cohort and evaluate the fields transparency and reproducibility. MethodsWe systematically reviewed AI-ECG models predicting LVSD and assessed the risk of bias. Authors were invited to share models for external validation in a well-phenotyped registry of patients undergoing routine clinical cardiac magnetic resonance imaging (CMR) with cardiologist-adjudicated reports and paired ECGs. Model performance was evaluated in all consecutive patients and a lower-complexity subgroup with 15% LVSD prevalence. ResultsWe identified 35 studies describing 51 models, reporting high (AUROC >0.80) or excellent (AUROC >0.90) performance. The risk of bias is high and primarily attributed to the limited description of development and validation cohort characteristics, as well as the lack of independent external validation. Four groups (from Korea, the United States, Taiwan, and the Netherlands) shared models for independent testing. AUROCs ranged from 0.83 to 0.93 in all patients (n = 1,203; mean age 59 {+/-} 15 years; 450 [35%] female) and from 0.87 to 0.96 in the lower complexity subset. Performance remained consistent across subgroups, with slight decreases in ECGs showing wide QRS complexes or atrial fibrillation. ConclusionsIn this first-in-kind independent validation and head-to-head comparison study, AI-ECG for LVSD detection demonstrated strong performance despite training on disparate populations. However, the limited availability of models hinders independent validation.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.