Back

Cadence: A Benchmark Evaluation of the Narrative Velocity Framework for Next Clinical Event Prediction in MIMIC-IV

Rouhollahi, A.; Nezami, F. R.

2026-05-11 bioinformatics
10.64898/2026.05.06.722409 bioRxiv
Show abstract

ObjectiveHow structured clinical features and cluster-semantic embeddings interact under self-distillation in EHR prediction models is unknown. Existing approaches treat these sources separately (gradient-boosted trees exploit tabular features while sequence models process text), and their interaction under self-distillation regularisation remains uncharacterised. We introduce the Narrative Velocity (NV) framework and evaluate this interaction in a 7-model benchmark. Materials and MethodsCadence is a [~]5.86M-parameter residual multilayer perceptron (MLP) combining structured EHR features with frozen PubMedBERT embeddings of cluster-label strings under born-again self-distillation from a prior Cadence checkpoint (seed-42 teacher; [1]). Cadence is benchmarked against six comparators on MIMIC-IV v3.1 with dual-sex TRIPOD+AI reporting (5 student seeds for Cadence; 2-3 seeds for baselines). ResultsAt full-cohort scale, Cadence achieves 38.04 {+/-} 0.04% male and 35.66 {+/-} 0.04% female top-1 accuracy, exceeding the strongest non-neural baseline (XGBoost-2420, trained on the identical 2,420-dimensional input) by +1.35 pp male and +0.82 pp female (paired t-test on shared seeds 42-44: t(2) = 69.06, p = 2.10 x 10-4 male; t(2) = 25.32, p = 1.56 x 10-3 female). On time-to-next-event regression Cadence lowers MAE by 7.68 d male and 7.30 d female versus XGBoost-2420; FT-Transformer attains the lowest absolute MAE at full scale (27.58 d male, 36.63 d female), revealing a classification-regression trade-off across model families. A controlled 2 x 2 random-vector ablation isolates the self-distillation-embedding interaction at +0.49 pp top-1 (95% CI [0.35, 0.64] pp; bootstrap, n = 10,000 resamples; 3-teacher-seed mean +0.513 {+/-} 0.010 pp) under a matched-dimensionality null. A 3-teacher-seed validation (multi_teacher_02) confirms the interaction is robust to teacher-seed identity (per-seed values +0.525, +0.509, +0.507 pp; mean +0.513 {+/-} 0.010 pp). Cadence achieves the best Brier score among evaluated models (0.774 male / 0.798 female) but its raw probabilities are systematically miscalibrated (ECE 0.077 vs. XGBoost-884s 0.010); after a single scalar temperature scaling step (T * {approx} 0.81), ECE drops to {approx}0.028 while Brier remains best. On a small (n = 1,120 patients, 39,120 events) external OCR-extracted BWH cohort, Cadence ranked 3rd of 7 models with three confounded sources of error (institutional shift, OCR noise, centroid mapping); we therefore report this as a generalisation probe rather than a definitive external validation. At the longer h30 evaluation horizon Cadences MAE advantage reverses (47.35 d versus XGBoost 45.06 d), reflecting the absence of a matched-horizon self-distillation teacher. DiscussionThe 2 x 2 random-vector ablation confirms that the self-distillation gain on PubMedBERT embeddings (+0.78 pp) exceeds that on matched-dimensionality random vectors (+0.29 pp) by +0.49 pp, isolating the interaction to semantic content rather than feature dimensionality. The factorial decomposition (+0.49-0.51 pp interaction) and the sequential pipeline-level decomposition (Supplementary Table S3) are complementary triangulations under different reference frames and are not directly additive. ConclusionThis 7-model benchmark establishes a dual-sex, dual-metric, cross-institutional reference for next clinical event prediction under the TRIPOD+AI reporting framework. These results characterise discrimination and calibration on a single retrospective cohort; prospective evaluation, decision-curve analysis, and harm-benefit assessment are required before clinical deployment.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.3%
18.5%
2
Nature Communications
4913 papers in training set
Top 16%
10.6%
3
Scientific Reports
3102 papers in training set
Top 9%
8.5%
4
The Lancet Digital Health
25 papers in training set
Top 0.1%
6.9%
5
Nature Medicine
117 papers in training set
Top 0.6%
4.4%
6
Nature Machine Intelligence
61 papers in training set
Top 1%
2.8%
50% of probability mass above
7
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
2.6%
8
PLOS ONE
4510 papers in training set
Top 44%
2.6%
9
Communications Medicine
85 papers in training set
Top 0.1%
2.4%
10
Patterns
70 papers in training set
Top 0.6%
1.9%
11
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
12
European Journal of Human Genetics
49 papers in training set
Top 0.6%
1.7%
13
Bioinformatics
1061 papers in training set
Top 7%
1.7%
14
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.5%
1.5%
15
Nucleic Acids Research
1128 papers in training set
Top 13%
1.3%
16
GigaScience
172 papers in training set
Top 2%
1.2%
17
Bioinformatics Advances
184 papers in training set
Top 4%
1.2%
18
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 39%
1.1%
19
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.6%
0.9%
20
Journal of Translational Medicine
46 papers in training set
Top 2%
0.9%
21
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
22
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
23
Cell Reports Medicine
140 papers in training set
Top 7%
0.8%
24
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.8%
25
eBioMedicine
130 papers in training set
Top 4%
0.8%
26
Computational and Structural Biotechnology Journal
216 papers in training set
Top 9%
0.8%
27
Journal of Medical Internet Research
85 papers in training set
Top 4%
0.8%
28
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
29
Advanced Science
249 papers in training set
Top 19%
0.8%
30
Communications Biology
886 papers in training set
Top 25%
0.7%