Back

Joint Variable Selection for Omic Biomarkers in Time-to-Event Data

Bajzik, J.; Depope, A.; Zolfimoselo, Y.; Sharipov, A.; Lesayova, A.; Klein, H.; Richmond, A.; Vernardis, S.; Grauslys, A.; Andrejev, S.; Zelezniak, A.; Ralser, M.; Marioni, R.; Mondelli, M.; Robinson, M. R.

2026-05-04 bioinformatics
10.64898/2026.04.30.721585 bioRxiv
Show abstract

The incidence of the vast majority of neurodegenerative, cancer, and metabolic diseases generally increases exponentially with age. In large-scale biobanks, linking time-to-diagnosis information in electronic health records to multiple genomic ("multiomics") measures has the potential to reveal the genes and biological pathways involved in the disease onset and progression. To date, association testing has commonly been conducted by testing one variable at a time using semiparametric Cox proportional hazards (CoxPH) models, which ignores correlation structure and increases the risk of false discoveries. To address these issues, we introduce a novel fully parametric Bayesian computational method, vampW, based on the Vector Approximate Message Passing framework applied to a Weibull model. vampW jointly models correlated features, while providing an interpretable hazard structure, producing a continuous survival curve, and incorporating prior knowledge. In an extensive simulation study, we demonstrate that joint modeling of omics data and time-to-event outcomes with vampW, substantially reduces false discoveries in comparison to marginal testing and other forms of joint CoxPH models. In 53,018 individuals from the UK Biobank, vampW identifies 219 protein associations with 24 disease outcomes, most of which are not among the top marginal discoveries. We further correct protein levels for exponential age effects, identifying 1,308 associations and highlighting the sensitivity of the analysis to age-correction methodology. Our findings replicate in independent cohorts using different measurement technologies, within data from Iceland and a novel Generation Scotland proteomics dataset. vampW also achieves significant improvement in the prediction of disease onset times: across 14 outcomes, it reduces the root mean squared error by over 32% and 26%, when compared to CoxPH variants and the deep learning approach DeepSurv, respectively, while maintaining predictive utility in minority populations. In summary, vampW offers accurate and interpretable variable selection and out-of-sample prediction within a single computational framework, making it a powerful tool for dissecting the genomic architecture of common complex disease onset.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
Nature Communications
4913 papers in training set
Top 7%
18.2%
2
Bioinformatics
1061 papers in training set
Top 2%
12.2%
3
Cell Systems
167 papers in training set
Top 2%
6.2%
4
Nature Methods
336 papers in training set
Top 2%
6.2%
5
Molecular & Cellular Proteomics
158 papers in training set
Top 0.6%
4.2%
6
Nature Machine Intelligence
61 papers in training set
Top 0.7%
4.2%
50% of probability mass above
7
PLOS Computational Biology
1633 papers in training set
Top 10%
3.5%
8
Nature Biotechnology
147 papers in training set
Top 3%
3.0%
9
Nucleic Acids Research
1128 papers in training set
Top 7%
3.0%
10
Genome Biology
555 papers in training set
Top 3%
2.8%
11
Molecular Systems Biology
142 papers in training set
Top 0.3%
2.8%
12
Genome Medicine
154 papers in training set
Top 4%
2.0%
13
Journal of Proteome Research
215 papers in training set
Top 1%
1.8%
14
Advanced Science
249 papers in training set
Top 12%
1.7%
15
NAR Genomics and Bioinformatics
214 papers in training set
Top 2%
1.7%
16
PLOS ONE
4510 papers in training set
Top 55%
1.6%
17
Cell Genomics
162 papers in training set
Top 4%
1.5%
18
Scientific Reports
3102 papers in training set
Top 64%
1.3%
19
Communications Biology
886 papers in training set
Top 13%
1.3%
20
Patterns
70 papers in training set
Top 2%
1.2%
21
eLife
5422 papers in training set
Top 51%
1.1%
22
Genome Research
409 papers in training set
Top 4%
0.9%
23
Molecular Cell
308 papers in training set
Top 9%
0.9%
24
Cell Reports Methods
141 papers in training set
Top 5%
0.8%
25
Science Advances
1098 papers in training set
Top 29%
0.8%
26
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 46%
0.7%
27
The American Journal of Human Genetics
206 papers in training set
Top 4%
0.7%
28
iScience
1063 papers in training set
Top 38%
0.6%