Machine Learning-Based Identification of Blood Biomarkers that Distinguish Precachectic and Cachectic Patients with Pancreatic Ductal Adenocarcinoma

Olumoyin, K. D.; Park, M. A.; Davis, E.; Permuth, J. B.; Rejniak, K. A.

2025-12-27 health informatics

10.64898/2025.12.23.25342866 medRxiv

Show abstract

BackgroundIdentification of minimally invasive biomarkers of different stages of cachexia (Ca), and precachexia (PCa) in particular, might help clinicians in treating patients with pancreatic ductal adenocarcinoma (PDAC) at high risk of progressing to a more severe cachectic stage. In this work, we developed a machine-learning (ML) model optimized to blood biomarkers data that identifies precachectic and cachectic patients. MethodsBlood and clinical data was collected from treatment-naive patients with PDAC through the Florida Pancreas Collaborative (FPC), a multi-institutional cohort study and biobanking initiative. Blood was processed into serum and assayed for a total of 35 candidate biomarkers. Participants were classified as having noncachexia (NCa), precachexia, or cachexia according to modified criteria by Vigano and colleagues which consider unintentional weight loss and biochemical data. Using these data, we designed ML algorithms to: (i) pre-select predictive blood biomarker candidates using a combination of mutual information method together with the leave-one-feature-out (LOFO) feature importance approach; (ii) identify the minimal combination of predictive biomarkers using the forward feature selection method; (iii) determine the optimal classification hyperparameters for the support vector machine using a cross-validation technique; and (iv) adjust the decision-boundary threshold for imbalanced data using the Matthews correlation coefficient. Three ML-based binary predictors were designed to determine patients cachexia status: NCa vs. Ca; PCa vs. Ca; and PCa vs. NCa. ResultsThe biomarker levels from 184 patients (28 NCa, 53 PCa, and 103 Ca) were used in this study. The NCa vs. Ca predictor identified a set of 6 biomarkers and yielded area under the curve (AUC) of 0.835. The PCa vs. Ca predictor identified a set of 6 biomarkers and yielded AUC of 0.810. The PCa vs. NCa predictor identified a set of 5 biomarkers and yielded AUC of 0.771. ConclusionsThe developed ML models that use blood biomarker data provided effective predictions of patients cachexia stage that can help clinicians to diagnose PCa.

Machine Learning-Based Identification of Blood Biomarkers that Distinguish Precachectic and Cachectic Patients with Pancreatic Ductal Adenocarcinoma

Matching journals