Back

Cross-Cohort Generalizability of Plasma Biomarker Machine Learning Models Reveals Calibration-Driven Degradation in Clinical Utility

Korni, A.; Zandi, E.

2026-04-13 neurology
10.64898/2026.04.09.26350514 medRxiv
Show abstract

BackgroundPlasma biomarkers demonstrate strong within-cohort performance for identifying cerebral amyloid pathology, but their real-world clinical utility depends on generalization across populations and assay platforms. The impact of cross-cohort deployment on clinically actionable metrics such as negative predictive value (NPV) remains poorly characterized. ObjectiveTo evaluate the performance and portability of plasma biomarker-based machine learning models for amyloid PET prediction across independent cohorts, with emphasis on calibration and clinically relevant predictive values. MethodsData from ADNI (n=885) and A4 (n=822) were analyzed. Machine learning models were trained within each cohort to predict amyloid PET status and continuous amyloid burden (centiloids). Performance was assessed using ROC AUC, accuracy, R{superscript 2}, and RMSE. Cross-cohort generalizability was evaluated using bidirectional transfer without retraining. Calibration, predictive values, and decision curve analysis were used to assess clinical utility. ResultsWithin-cohort discrimination was high (AUC up to 0.913 in ADNI and 0.870 in A4), with moderate performance for centiloid prediction (R{superscript 2} up to 0.628 and 0.535, respectively). Cross-cohort deployment resulted in modest attenuation of AUC ([~]4-7%) but substantially greater degradation in clinically actionable performance. NPV declined from 0.831 to 0.644 under ADNI[->]A4 transfer ([~]19 percentage points) despite preserved discrimination. Calibration analyses demonstrated systematic probability misestimation, and decision curve analysis showed reduced net clinical benefit. Biomarker distribution differences across cohorts were consistent with dataset shift. ConclusionPlasma biomarker models retain discrimination across cohorts but exhibit clinically meaningful degradation in predictive value under deployment. Calibration instability and prevalence differences critically affect NPV, highlighting the need for cross-cohort validation, calibration assessment, and assay harmonization before clinical implementation.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Alzheimer's & Dementia
143 papers in training set
Top 0.2%
33.6%
2
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.1%
10.3%
3
European Journal of Nuclear Medicine and Molecular Imaging
19 papers in training set
Top 0.1%
6.5%
50% of probability mass above
4
npj Digital Medicine
97 papers in training set
Top 1%
4.0%
5
Annals of Neurology
57 papers in training set
Top 0.5%
3.7%
6
eBioMedicine
130 papers in training set
Top 0.3%
3.7%
7
Clinical Chemistry
22 papers in training set
Top 0.2%
3.3%
8
Nature Communications
4913 papers in training set
Top 46%
2.1%
9
The Lancet Digital Health
25 papers in training set
Top 0.2%
2.1%
10
Scientific Reports
3102 papers in training set
Top 55%
1.8%
11
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.7%
12
Nature Medicine
117 papers in training set
Top 2%
1.7%
13
Brain
154 papers in training set
Top 3%
1.7%
14
Brain Communications
147 papers in training set
Top 2%
1.4%
15
JAMA Network Open
127 papers in training set
Top 3%
1.2%
16
Frontiers in Neurology
91 papers in training set
Top 4%
1.2%
17
Journal of Neurology, Neurosurgery & Psychiatry
29 papers in training set
Top 1%
0.9%
18
BMJ Open
554 papers in training set
Top 12%
0.8%
19
Human Brain Mapping
295 papers in training set
Top 4%
0.8%
20
Neurology
44 papers in training set
Top 2%
0.8%
21
PLOS ONE
4510 papers in training set
Top 71%
0.7%
22
Acta Biomaterialia
85 papers in training set
Top 1.0%
0.7%
23
Communications Medicine
85 papers in training set
Top 2%
0.5%
24
The Journal of Prevention of Alzheimer's Disease
10 papers in training set
Top 0.4%
0.5%