Back

Biomarker Signal Architecture in Cardiovascular Machine Learning: Stability, Redundancy, and Minimal High-Yield Panels After Myocardial Infarction

Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.

2026-05-22 cardiovascular medicine
10.64898/2026.05.19.26353638 medRxiv
Show abstract

Background: Machine-learning models based on circulating biomarkers are increasingly used in cardiovascular research; however, model performance alone provides limited insight into how the predictive signal is distributed across features. We aimed to characterize the biomarker signal architecture of a machine-learning model distinguishing ST-elevation myocardial infarction (STEMI) from non-ST-elevation myocardial infarction (NSTEMI), with a focus on signal concentration, redundancy, and conditional complementarity. Methods: We conducted a structured secondary analysis of a previously established, leakage-controlled machine-learning framework (n = 152 patients). The BIOMARKERS feature-set variant (10 biomarkers) was evaluated using outer-fold cross-validation. Model structure was interrogated using (i) leave-one-biomarker-out analysis, (ii) pairwise leave-two-out analysis with pair-excess estimation, (iii) cumulative ablation of top-ranked biomarkers, and (iv) forward reconstruction of minimal biomarker panels. Uncertainty was assessed using bootstrap resampling across folds. Results: The full biomarker model achieved a mean ROC-AUC approaching 0.94. The predictive signal was highly non-uniform, with MMP-2 showing the largest single-feature contribution (mean {Delta}AUC {approx} 0.16). Pairwise analysis identified conditional complementarity between selected non-lipid biomarkers, particularly MMP-2 and EMMPRIN (pair {Delta}AUC {approx} 0.26; positive excess over single-feature effects), whereas lipid-related markers formed a highly correlated and largely redundant sub-cluster. Cumulative ablation demonstrated rapid performance collapse following removal of top-ranked biomarkers, consistent with structural signal concentration. Forward panel analysis showed that a compact subset of biomarkers (three features) achieved performance within ~0.01 ROC-AUC of the full model, indicating the presence of a minimal high-yield panel. Bootstrap confidence intervals suggested that small performance differences should be interpreted with caution. Conclusions: Predictive performance in this biomarker-based model arises from a structured and unevenly distributed signal architecture, characterized by a dominant core biomarker, conditionally complementary contributors, and a redundant lipid cluster. These findings highlight the importance of evaluating model structure, not only aggregate performance, and suggest that biomarker-based machine-learning systems may benefit from architecture-aware interpretation and simplification strategies.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
European Heart Journal - Digital Health
15 papers in training set
Top 0.1%
10.3%
2
Scientific Reports
3102 papers in training set
Top 6%
10.3%
3
PLOS ONE
4510 papers in training set
Top 26%
6.4%
4
Circulation
66 papers in training set
Top 0.7%
4.9%
5
Journal of the American Heart Association
119 papers in training set
Top 1%
4.9%
6
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
4.0%
7
Circulation: Genomic and Precision Medicine
42 papers in training set
Top 0.4%
3.7%
8
npj Digital Medicine
97 papers in training set
Top 1%
3.1%
9
BMC Medicine
163 papers in training set
Top 2%
3.1%
50% of probability mass above
10
Biology Methods and Protocols
53 papers in training set
Top 0.5%
2.1%
11
Frontiers in Physiology
93 papers in training set
Top 2%
2.1%
12
BMC Genomics
328 papers in training set
Top 2%
2.1%
13
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.2%
1.9%
14
The Lancet Digital Health
25 papers in training set
Top 0.3%
1.9%
15
Medical Image Analysis
33 papers in training set
Top 0.6%
1.8%
16
The American Journal of Cardiology
15 papers in training set
Top 0.9%
1.7%
17
Epidemiology
26 papers in training set
Top 0.2%
1.7%
18
Critical Care Explorations
15 papers in training set
Top 0.3%
1.5%
19
Diagnostics
48 papers in training set
Top 1%
1.5%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.5%
21
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
22
PeerJ
261 papers in training set
Top 11%
1.0%
23
eLife
5422 papers in training set
Top 51%
1.0%
24
EBioMedicine
39 papers in training set
Top 0.8%
0.9%
25
Physiological Measurement
12 papers in training set
Top 0.3%
0.9%
26
BMJ Health & Care Informatics
13 papers in training set
Top 0.7%
0.9%
27
Biomolecules
95 papers in training set
Top 2%
0.8%
28
American Journal of Physiology-Heart and Circulatory Physiology
32 papers in training set
Top 1%
0.8%
29
Journal of Clinical Medicine
91 papers in training set
Top 6%
0.8%
30
Bioinformatics
1061 papers in training set
Top 9%
0.8%