Back

Machine Learning-Based Identification of Blood Biomarkers that Distinguish Precachectic and Cachectic Patients with Pancreatic Ductal Adenocarcinoma

Olumoyin, K. D.; Park, M. A.; Davis, E.; Permuth, J. B.; Rejniak, K. A.

2025-12-27 health informatics
10.64898/2025.12.23.25342866 medRxiv
Show abstract

BackgroundIdentification of minimally invasive biomarkers of different stages of cachexia (Ca), and precachexia (PCa) in particular, might help clinicians in treating patients with pancreatic ductal adenocarcinoma (PDAC) at high risk of progressing to a more severe cachectic stage. In this work, we developed a machine-learning (ML) model optimized to blood biomarkers data that identifies precachectic and cachectic patients. MethodsBlood and clinical data was collected from treatment-naive patients with PDAC through the Florida Pancreas Collaborative (FPC), a multi-institutional cohort study and biobanking initiative. Blood was processed into serum and assayed for a total of 35 candidate biomarkers. Participants were classified as having noncachexia (NCa), precachexia, or cachexia according to modified criteria by Vigano and colleagues which consider unintentional weight loss and biochemical data. Using these data, we designed ML algorithms to: (i) pre-select predictive blood biomarker candidates using a combination of mutual information method together with the leave-one-feature-out (LOFO) feature importance approach; (ii) identify the minimal combination of predictive biomarkers using the forward feature selection method; (iii) determine the optimal classification hyperparameters for the support vector machine using a cross-validation technique; and (iv) adjust the decision-boundary threshold for imbalanced data using the Matthews correlation coefficient. Three ML-based binary predictors were designed to determine patients cachexia status: NCa vs. Ca; PCa vs. Ca; and PCa vs. NCa. ResultsThe biomarker levels from 184 patients (28 NCa, 53 PCa, and 103 Ca) were used in this study. The NCa vs. Ca predictor identified a set of 6 biomarkers and yielded area under the curve (AUC) of 0.835. The PCa vs. Ca predictor identified a set of 6 biomarkers and yielded AUC of 0.810. The PCa vs. NCa predictor identified a set of 5 biomarkers and yielded AUC of 0.771. ConclusionsThe developed ML models that use blood biomarker data provided effective predictions of patients cachexia stage that can help clinicians to diagnose PCa.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Cancer Medicine
24 papers in training set
Top 0.1%
20.0%
2
PLOS ONE
4510 papers in training set
Top 21%
8.7%
3
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.3%
8.7%
4
Scientific Reports
3102 papers in training set
Top 9%
8.5%
5
JAMIA Open
37 papers in training set
Top 0.3%
4.4%
50% of probability mass above
6
Journal of Medical Internet Research
85 papers in training set
Top 2%
3.4%
7
BMC Cancer
52 papers in training set
Top 0.8%
2.7%
8
Computers in Biology and Medicine
120 papers in training set
Top 1%
2.2%
9
JMIR Medical Informatics
17 papers in training set
Top 0.5%
2.1%
10
eBioMedicine
130 papers in training set
Top 0.9%
1.9%
11
Diagnostics
48 papers in training set
Top 1.0%
1.7%
12
International Journal of Medical Informatics
25 papers in training set
Top 0.8%
1.7%
13
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.2%
1.7%
14
PeerJ
261 papers in training set
Top 6%
1.7%
15
Informatics in Medicine Unlocked
21 papers in training set
Top 0.5%
1.5%
16
Frontiers in Immunology
586 papers in training set
Top 5%
1.3%
17
Biology Methods and Protocols
53 papers in training set
Top 2%
1.1%
18
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
1.0%
19
PLOS Computational Biology
1633 papers in training set
Top 21%
1.0%
20
Experimental Neurology
57 papers in training set
Top 1%
0.8%
21
BioMed Research International
25 papers in training set
Top 3%
0.8%
22
Frontiers in Medicine
113 papers in training set
Top 6%
0.8%
23
Frontiers in Digital Health
20 papers in training set
Top 1%
0.7%
24
BMC Medicine
163 papers in training set
Top 8%
0.7%
25
Artificial Intelligence in Medicine
15 papers in training set
Top 0.8%
0.7%
26
Translational Oncology
18 papers in training set
Top 0.5%
0.7%
27
Cancers
200 papers in training set
Top 6%
0.5%
28
Journal of Biomedical Informatics
45 papers in training set
Top 2%
0.5%
29
JNCI Cancer Spectrum
10 papers in training set
Top 0.7%
0.5%
30
JMIR Research Protocols
18 papers in training set
Top 2%
0.5%