Back

Multi-Algorithm Machine Learning Benchmarking for Pan-Cancer Classification from Tumour-Educated Platelet RNA Sequencing

Ray, S.; Zalawadia, D. H.; Bhate, V.; Chakravarthy, T. D.; Chetty, A. G.

2026-05-26 bioinformatics
10.64898/2026.05.22.727079 bioRxiv
Show abstract

Tumour-educated platelets (TEPs) carry cancer-type-specific RNA signatures accessible through whole-blood RNA sequencing, but systematic multi-algorithm benchmarking with quantified statistical uncertainty had not been applied to the GSE68086 dataset, the fields primary reference cohort. We applied an end-to-end transcriptomic and machine learning framework to 280 whole-blood platelet RNA-seq samples from six cancer types (non-small cell lung cancer, colorectal cancer, glioblastoma multiforme, hepatobiliary cancer, breast cancer, and pancreatic cancer) and healthy donors. After a standardised preprocessing and normalisation pipeline, seven supervised classifiers - Logistic Regression, SVM (RBF), XGBoost, LightGBM, Random Forest, K-Nearest Neighbours, and a Multilayer Perceptron were benchmarked using stratified 5-fold cross-validation and a held-out test set. Statistical uncertainty was quantified via 2,000-resample percentile bootstrap confidence intervals. Multinomial Logistic Regression achieved the highest test macro F1-score (0.522) and macro-averaged ROC-AUC (0.869), both substantially above the seven-class chance level (1/7 {approx} 0.14). SHAP analysis of the Random Forest classifier identified IFITM3 as the globally dominant TEP biomarker; cancer-type-specific discriminators included ATP5PD (hepatobiliary cancer), C6orf62 (NSCLC and pancreatic cancer), VPS13C (healthy donors), and TMSB4Y (breast cancer). Gene Ontology and KEGG pathway enrichment corroborated the biological specificity of identified transcriptomic signatures. These results support the diagnostic potential of TEP transcriptomics as a multi-class liquid biopsy platform and provide a methodologically transparent, reproducible reference framework for future blood-based cancer classification studies.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Genome Medicine
154 papers in training set
Top 0.1%
22.9%
2
Nature Communications
4913 papers in training set
Top 9%
14.9%
3
Cell Reports Medicine
140 papers in training set
Top 0.6%
4.9%
4
Scientific Reports
3102 papers in training set
Top 22%
4.9%
5
NAR Genomics and Bioinformatics
214 papers in training set
Top 1%
2.6%
50% of probability mass above
6
Briefings in Bioinformatics
326 papers in training set
Top 3%
2.6%
7
International Journal of Molecular Sciences
453 papers in training set
Top 4%
2.6%
8
Communications Biology
886 papers in training set
Top 4%
2.4%
9
Nucleic Acids Research
1128 papers in training set
Top 8%
2.1%
10
Clinical Chemistry
22 papers in training set
Top 0.3%
2.1%
11
Advanced Science
249 papers in training set
Top 9%
1.9%
12
PLOS ONE
4510 papers in training set
Top 52%
1.7%
13
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
14
Frontiers in Immunology
586 papers in training set
Top 5%
1.5%
15
Genome Biology
555 papers in training set
Top 5%
1.5%
16
BMC Genomics
328 papers in training set
Top 3%
1.4%
17
iScience
1063 papers in training set
Top 21%
1.2%
18
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 5%
1.1%
19
Frontiers in Genetics
197 papers in training set
Top 7%
1.0%
20
Cell Reports Methods
141 papers in training set
Top 4%
1.0%
21
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
22
EMBO Molecular Medicine
85 papers in training set
Top 4%
0.8%
23
Nature Biotechnology
147 papers in training set
Top 7%
0.8%
24
Cancer Research Communications
46 papers in training set
Top 1%
0.8%
25
npj Precision Oncology
48 papers in training set
Top 1%
0.7%
26
Small Methods
26 papers in training set
Top 1%
0.7%
27
eLife
5422 papers in training set
Top 61%
0.7%
28
Cell Genomics
162 papers in training set
Top 8%
0.7%
29
Genome Research
409 papers in training set
Top 5%
0.7%
30
Nature Biomedical Engineering
42 papers in training set
Top 3%
0.5%