Back

A Cross-Cohort Validated Plasma Lipid Biomarker Assay for Early Breast Cancer Detection Using Machine Learning

Huang, T.; Koch, F. C.; Peake, D. A.; Adam, K.-P.; David, M.; Li, D.; Heffernan, K.; Lim, A.; Hurrell, J. G.; Preston, S.; Baterseh, A.; Vafaee, F.

2026-04-23 oncology
10.64898/2026.04.23.26351564 medRxiv
Show abstract

Early detection of breast cancer remains essential for improving clinical outcomes, and complementary non-invasive approaches are needed to support existing screening methods, particularly for women with dense breast tissue. We have previously reported plasma lipid biomarker discovery using untargeted high-resolution liquid chromatography tandem mass spectrometry (LC-MS/MS). In this study, we performed biomarker confirmation and developed machine-learning models applied to targeted plasma lipid measurements for the non-invasive detection of early-stage breast cancer across international cohorts with independent external validation. Targeted LC-MS/MS was used to quantify candidate lipid panels in plasma samples from European discovery cohorts (n = 554) and an independent Australian cohort (n = 266) used for external validation. Data-driven feature selection identified a 15-lipid panel with strong performance in European cohorts (AUC >= 0.94). External validation prior to confidence stratification yielded 76% sensitivity, 64% specificity, and an AUC of 0.81 in the Australian validation cohort. Clinical assay development requires iterative panel and model testing to support translational feasibility and performance in the intended-use population. An analytically viable panel, excluding lipids requiring complex and costly synthesis, achieved comparable accuracy with improved assay robustness. Confidence-based analysis showed enhanced performance for predictions made with moderate to high confidence, with sensitivity up to 89% and AUC up to 0.85, suggesting that ongoing research should focus on strategies to enhance diagnostic model confidence. Importantly, model predictions were independent of breast density, tumour size, grade, subtype, and morphology, indicating biological specificity of the lipid signature. These results demonstrate that calibrated machine-learning models applied to plasma lipid biomarkers can support non-invasive breast cancer detection. Expanding training datasets to include greater diversity will further improve performance in the ongoing development of this lipid-based detection approach.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Breast Cancer Research
32 papers in training set
Top 0.1%
26.7%
2
Scientific Reports
3102 papers in training set
Top 13%
7.0%
3
PLOS ONE
4510 papers in training set
Top 33%
4.5%
4
JNCI Cancer Spectrum
10 papers in training set
Top 0.1%
4.3%
5
Endocrinology
38 papers in training set
Top 0.1%
4.1%
6
Cancers
200 papers in training set
Top 1%
3.7%
50% of probability mass above
7
Nature Communications
4913 papers in training set
Top 38%
3.7%
8
The Journal of Clinical Endocrinology & Metabolism
35 papers in training set
Top 0.3%
3.7%
9
Diagnostics
48 papers in training set
Top 0.7%
2.1%
10
BMC Cancer
52 papers in training set
Top 1%
1.9%
11
eLife
5422 papers in training set
Top 39%
1.8%
12
Journal of Proteome Research
215 papers in training set
Top 1%
1.8%
13
npj Breast Cancer
18 papers in training set
Top 0.1%
1.7%
14
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.3%
1.7%
15
ACS Sensors
45 papers in training set
Top 0.8%
1.5%
16
Metabolites
50 papers in training set
Top 0.5%
1.5%
17
Physiological Reports
35 papers in training set
Top 0.6%
1.5%
18
Annals of Biomedical Engineering
34 papers in training set
Top 0.8%
1.3%
19
Oncotarget
15 papers in training set
Top 0.2%
1.0%
20
Cancer Research Communications
46 papers in training set
Top 0.9%
0.9%
21
Frontiers in Bioinformatics
45 papers in training set
Top 0.8%
0.8%
22
Immunology
29 papers in training set
Top 1.0%
0.8%
23
International Journal of Molecular Sciences
453 papers in training set
Top 15%
0.8%
24
Laboratory Investigation
13 papers in training set
Top 0.2%
0.8%
25
Angewandte Chemie International Edition
81 papers in training set
Top 3%
0.7%
26
Biomedical Optics Express
84 papers in training set
Top 1%
0.7%
27
Journal of Pathology Informatics
13 papers in training set
Top 0.4%
0.7%
28
Human Molecular Genetics
130 papers in training set
Top 3%
0.7%
29
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1%
0.7%
30
Clinical Cancer Research
58 papers in training set
Top 2%
0.7%