Back

Biomarker Identification in Pancreatic Cancer Through Concordant Differential Expression and Interpretable Machine Learning Analyses

Macia Escalante, S.; Lopez Aladid, R.; Tovar, R.; Lopez Romero, M.; Navarro Selles, A.; Garmendia, L.; Puerto Lillo, C.; Fossati, M.; Parente, P.

2026-02-16 oncology
10.64898/2026.02.13.26346263 medRxiv
Show abstract

BackgroundPancreatic ductal adenocarcinoma is one of the most aggressive and lethal malignancies of the gastrointestinal tract. The poor prognosis is largely attributed to late-stage diagnosis, pronounced tumor heterogeneity, and limited therapeutic efficacy. These challenges underscore the urgent need for the identification of robust molecular biomarkers and novel therapeutic targets. MethodsGene expression data from a total of 146 pancreatic tissue samples, comprising 72 normal and 74 tumor specimens obtained from the Pan-Cancer Atlas(TCGA) were analyzed. Differential gene expression analysis was conducted using the DESeq2 package, followed by functional enrichment analysis based on GO and KEGG. A classification model was developed using the XGBoost algorithm and evaluated through 500 bootstrapping iterations and 5-fold cross-validation to ensure robustness and generalizability. Model interpretability was assessed using SHAP (SHapley Additive exPlanations) values to identify genes with the highest predictive contribution. ResultsA comprehensive transcriptomic analysis revealed significant dysregulation of multiple genes between normal and tumor pancreatic tissues. Genes such as GJB3, S100A2, MSLN, and SLC2A1 were notably overexpressed, whereas DEFA6, APOB, and RBP2 exhibited marked downregulation, indicative of impaired exocrine function and aberrant epithelial reprogramming. The XGBoost classification model achieved an average area under the curve (AUC) of 0.9868 and an overall accuracy of 98.6%. SHAP (SHapley Additive exPlanations) analysis identified GJB3, LINC02086, and TSPAN1 as key predictive features. Six genes were concurrently identified as differentially expressed and highly influential within the model, supporting their potential utility as robust biomarkers for pancreatic tumor characterization. ConclusionsPancreatic ductal adenocarcinoma is marked by extensive transcriptomic reprogramming. The integration of differential gene expression analysis with interpretable machine learning enabled the identification of a molecular signature with potential diagnostic and therapeutic relevance.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
BMC Cancer
52 papers in training set
Top 0.1%
12.4%
2
Cancer Medicine
24 papers in training set
Top 0.1%
8.4%
3
PLOS ONE
4510 papers in training set
Top 24%
7.2%
4
Scientific Reports
3102 papers in training set
Top 14%
6.8%
5
British Journal of Cancer
42 papers in training set
Top 0.2%
6.3%
6
Cancers
200 papers in training set
Top 1%
4.3%
7
JNCI Cancer Spectrum
10 papers in training set
Top 0.1%
3.6%
8
Cancer Epidemiology, Biomarkers & Prevention
17 papers in training set
Top 0.2%
3.6%
50% of probability mass above
9
Frontiers in Oncology
95 papers in training set
Top 1%
3.6%
10
PeerJ
261 papers in training set
Top 2%
3.6%
11
Diagnostics
48 papers in training set
Top 0.6%
2.7%
12
The Journal of Pathology
22 papers in training set
Top 0.1%
1.7%
13
Gastroenterology
40 papers in training set
Top 1%
1.7%
14
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.7%
15
Translational Oncology
18 papers in training set
Top 0.1%
1.5%
16
Frontiers in Immunology
586 papers in training set
Top 5%
1.3%
17
JCO Precision Oncology
14 papers in training set
Top 0.3%
1.2%
18
Journal of Clinical Medicine
91 papers in training set
Top 5%
0.9%
19
Molecular Oncology
50 papers in training set
Top 0.8%
0.9%
20
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
21
International Journal of Molecular Sciences
453 papers in training set
Top 14%
0.8%
22
Signal Transduction and Targeted Therapy
29 papers in training set
Top 1%
0.7%
23
JMIR Research Protocols
18 papers in training set
Top 1%
0.7%
24
EBioMedicine
39 papers in training set
Top 1%
0.7%
25
JNCI: Journal of the National Cancer Institute
16 papers in training set
Top 0.7%
0.7%
26
Clinical Chemistry
22 papers in training set
Top 0.8%
0.7%
27
Laboratory Investigation
13 papers in training set
Top 0.3%
0.7%
28
Neuropathology and Applied Neurobiology
14 papers in training set
Top 0.6%
0.7%
29
Journal of Translational Medicine
46 papers in training set
Top 3%
0.7%
30
Biology Methods and Protocols
53 papers in training set
Top 3%
0.7%