SuReCAN: a suite of user-friendly Galaxy machine learning workflows to predict survival and treatment response of cancer patients

Ju, J.; Koppes, D.; Stubbs, A. P.; Li, Y.

2025-08-13 health informatics

10.1101/2025.08.12.25333156 medRxiv

Show abstract

Cancer is one of the leading lethal causes worldwide, with enormous impact on healthcare, economy and society. One of the main challenges of clinical treatment planning is that patients usually have diverse clinical outcomes given the same diagnosis and treatments. To enable personalized cancer therapeutic planning, (bio)medical data analyses using machine learning (ML) models are introduced to efficiently extract informative biological patterns from the massive volume of complex biological data, aiding in cancer patients stratifications. For biomedical researchers without computational biology background, the gap between clinical practice and computational approaches is prominent and hinders the usage of machine learning in medical research. To fill this gap, we created a collection of ML workflows on the Galaxy platform named SUrvival and REsponse prediction for CANcer patients (SuReCAN) for clinicians and biologists to build and deploy predictive ML classifiers. Being freely available and accessible, SuReCAN automates the data analysis process and enables the clinicians and researchers to perform a broad range of predictive tasks. It contains a toolkit of three ML modules with various existing and newly implemented methods on Galaxy: A data normalization module, a feature selection module, and an ML classifier module. We exhibited the utility of SuReCAN with a few real-world datasets to identify pancreatic ductal adenocarcinoma (PDAC) patients survival-correlated subtypes and to predict drug response outcomes based on various omics data from patient tumor samples. As a result, all workflows achieved a median accuracy of over 0.8 in PDAC survival-correlated subtype classification. In particular, the workflow combining the feature selection method "SVM-based RFECV" and the Support Vector Machine classifier consistently outperformed the other workflows, while all classifiers have shown their superiority on different omics data. Importantly, SuReCAN is not only applicable for the clinical prediction tasks shown in the test cases but also suitable for new classifier development and deployment with clinical observations provided by the users. Providing a collection of user-friendly ML workflows, SuReCAN stratifies patients based on their biomedical profiling in a data-driven way and assists biomedical researchers with clinical decision-making and scientific discoveries.

SuReCAN: a suite of user-friendly Galaxy machine learning workflows to predict survival and treatment response of cancer patients

Matching journals