Interpretable Deep Learning-Based Multi-Omics Integrationfor Prognosis in Hepatocellular Carcinoma
Znabu, B. F.; Atif, Z.
Show abstract
Hepatocellular carcinoma (HCC) is a leading cause of cancer mortality worldwide, yet existing prognostic models incompletely capture its molecular heterogeneity. We developed an interpretable, attention-based multi-branch deep learning framework for multi-omics survival prediction in HCC. Using 358 TCGA LIHC patients with matched mRNA expression, miRNA expression, and DNA methylation data, we first reproduced the Chaudhary et al. autoencoder-based survival model as a baseline (C-index = 0.561, log-rank p = 3.10 x 10-2). We then designed a multi-branch architecture with omics-specific encoders, multi-head attention fusion, and Cox partial likelihood training, optimized via Bayesian hyperparameter search (100 Optuna trials). In 5-fold stratified cross-validation with nested feature selection (no data leakage), our attention model achieved a mean C-index of 0.683 {+/-} 0.039, outperforming the autoencoder baseline (0.561) and clinical-only model (0.637), and performing similarly to an AUTOSurv-like benchmark (0.697). Branch dropout enabled single-omics inference; external validation on the real GSE14520 cohort (n=221, mRNA) achieved a C-index of 0.637 (p = 0.004), comparable to Chaudhary et al.s reported 0.67 on the same data. Integrated gradients and attention weights highlighted features with prior links to HCC biology, including cell cycle genes (CCNA2, PLK1) and a Wnt pathway component (FZD7), along with candidate biomarkers stable across all cross-validation folds (PZP, SGCB, CD300LG, ZNF831 for mRNA; 12 miRNAs; 6 CpG sites). Differential expression analysis between model-defined risk groups identified 381 significant genes (Bonferroni p < 0.05), though this analysis is partly circular. Multivariable Cox regression indicated that the model-derived risk score adds prognostic value beyond clinical variables, with consistent performance across clinical subgroups, though clinical integration metrics were evaluated on training data. This framework provides a transparent, biologically grounded approach to multi-omics prognostication in HCC.
Matching journals
The top 5 journals account for 50% of the predicted probability mass.