clinTALL: machine learning-driven multimodal subtypeclassification and treatment outcome prediction in pediatric T-ALL
Stoiber, L.; Antic, Z.; Rebellato, S.; Fazio, G.; Rademacher, A.; Lenk, L.; Locatelli, F.; Balduzzi, A.; Cario, G.; Rizzari, C.; Cazzaniga, G.; Yu, J.; Bergmann, A. K.
Show abstract
BackgroundChildhood T-lineage acute lymphoblastic leukemia (T-ALL) is an aggressive hematologic malignancy with poor prognosis. Differently from B-cell precursor ALL, T-ALL lacks effective risk stratification strategies. A recent study has integrated whole genome and whole transcriptome data to define over 15 distinct molecular subtypes with prognostic significance. However, clinical translation of this knowledge remains challenging due to the complexity of interpreting high-dimensional multi-omics-based data. MethodsHere, we present clinTALL, a deep learning based multi-task pipeline for pediatric T-ALL subtype classification and treatment outcome estimation. The model integrates multimodal input data and uses a neural network architecture to generate a shared latent embedding for jointly learned multi-task prediction. The competing risk-based model was used to predict event-specific outcomes. The model was trained on a publicly available multimodal dataset comprising clinical, genomic and transcriptomic features of 1309 pediatric T-ALL samples. ResultsWe observed that the transcriptomic-only model achieved superior single modality results, with 92.2% accuracy for subtype prediction and a 65.9% concordance index (C-index) for event-free survival (EFS) in a cross-validation setup. Integrating all data modalities maintained high subtype classification accuracy (91.7%) and improved the overall concordance index for EFS estimation to 67.5%. The competing risk-based model enables accurate predictions of induction failure (C-index = 96.0%) and second malignant neoplasm (C-index = 62.1%). We validated molecular subtype predictions on an internal dataset of 120 pediatric T-ALL samples and obtained an accuracy of 81.8%. To facilitate the broad application of multi-omics based subtype prediction and treatment outcome inference, we provide clinTall as a Docker based application, allowing for user friendly access to the tool. The full source code of clinTALL is available on GitHub (https://github.com/UKWgenommedizin/clinTALL). ConclusionTogether, our machine learning-based framework allows for automated, accurate sub-type classification and treatment outcome inference using multimodal input data, advancing precision risk stratification for pediatric T-ALL.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.