Back

SuReCAN: a suite of user-friendly Galaxy machine learning workflows to predict survival and treatment response of cancer patients

Ju, J.; Koppes, D.; Stubbs, A. P.; Li, Y.

2025-08-13 health informatics
10.1101/2025.08.12.25333156 medRxiv
Show abstract

Cancer is one of the leading lethal causes worldwide, with enormous impact on healthcare, economy and society. One of the main challenges of clinical treatment planning is that patients usually have diverse clinical outcomes given the same diagnosis and treatments. To enable personalized cancer therapeutic planning, (bio)medical data analyses using machine learning (ML) models are introduced to efficiently extract informative biological patterns from the massive volume of complex biological data, aiding in cancer patients stratifications. For biomedical researchers without computational biology background, the gap between clinical practice and computational approaches is prominent and hinders the usage of machine learning in medical research. To fill this gap, we created a collection of ML workflows on the Galaxy platform named SUrvival and REsponse prediction for CANcer patients (SuReCAN) for clinicians and biologists to build and deploy predictive ML classifiers. Being freely available and accessible, SuReCAN automates the data analysis process and enables the clinicians and researchers to perform a broad range of predictive tasks. It contains a toolkit of three ML modules with various existing and newly implemented methods on Galaxy: A data normalization module, a feature selection module, and an ML classifier module. We exhibited the utility of SuReCAN with a few real-world datasets to identify pancreatic ductal adenocarcinoma (PDAC) patients survival-correlated subtypes and to predict drug response outcomes based on various omics data from patient tumor samples. As a result, all workflows achieved a median accuracy of over 0.8 in PDAC survival-correlated subtype classification. In particular, the workflow combining the feature selection method "SVM-based RFECV" and the Support Vector Machine classifier consistently outperformed the other workflows, while all classifiers have shown their superiority on different omics data. Importantly, SuReCAN is not only applicable for the clinical prediction tasks shown in the test cases but also suitable for new classifier development and deployment with clinical observations provided by the users. Providing a collection of user-friendly ML workflows, SuReCAN stratifies patients based on their biomedical profiling in a data-driven way and assists biomedical researchers with clinical decision-making and scientific discoveries.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Patterns
70 papers in training set
Top 0.1%
18.5%
2
Bioinformatics
1061 papers in training set
Top 2%
12.6%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.5%
8.4%
4
Advanced Science
249 papers in training set
Top 3%
6.3%
5
NAR Genomics and Bioinformatics
214 papers in training set
Top 0.3%
4.8%
50% of probability mass above
6
iScience
1063 papers in training set
Top 5%
3.6%
7
Scientific Reports
3102 papers in training set
Top 45%
2.6%
8
Nucleic Acids Research
1128 papers in training set
Top 9%
2.1%
9
Nature Communications
4913 papers in training set
Top 47%
2.1%
10
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.9%
11
Communications Biology
886 papers in training set
Top 6%
1.9%
12
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 3%
1.9%
13
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.7%
14
BMC Bioinformatics
383 papers in training set
Top 5%
1.7%
15
Genome Biology
555 papers in training set
Top 5%
1.5%
16
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.5%
17
Frontiers in Microbiology
375 papers in training set
Top 6%
1.5%
18
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
19
PLOS ONE
4510 papers in training set
Top 60%
1.2%
20
Nature Computational Science
50 papers in training set
Top 1%
1.1%
21
PLOS Computational Biology
1633 papers in training set
Top 21%
0.9%
22
Heliyon
146 papers in training set
Top 5%
0.9%
23
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.7%
0.9%
24
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
25
GigaScience
172 papers in training set
Top 3%
0.8%
26
Artificial Intelligence in Medicine
15 papers in training set
Top 0.7%
0.7%
27
BMC Medical Informatics and Decision Making
39 papers in training set
Top 3%
0.7%
28
Genome Medicine
154 papers in training set
Top 9%
0.6%
29
Computational and Structural Biotechnology Journal
216 papers in training set
Top 11%
0.6%