Back

Benchmark of biomarker identification and prognostic modeling methods on diverse censored data

Fletcher, W. L.; Sinha, S.

2026-04-01 bioinformatics
10.64898/2026.03.29.715113 bioRxiv
Show abstract

The practices of identifying biomarkers and developing prognostic models using genomic data has become increasingly prevalent. Such data often features characteristics that make these practices difficult, namely high dimensionality, correlations between predictors, and sparsity. Many modern methods have been developed to address these problematic characteristics while performing feature selection and prognostic modeling, but a large-scale comparison of their performances in these tasks on diverse right-censored time to event data (aka survival time data) is much needed. We have compiled many existing methods, including some machine learning methods, several which have performed well in previous benchmarks, primarily for comparison in regards to variable selection capability, and secondarily for survival time prediction on many synthetic datasets with varying levels of sparsity, correlation between predictors, and signal strength of informative predictors. For illustration, we have also performed multiple analyses on a publicly available and widely used cancer cohort from The Cancer Genome Atlas using these methods. We evaluated the methods through extensive simulation studies in terms of the false discovery rate, F1-score, concordance index, Brier score, root mean square error, and computation time. Of the methods compared, CoxBoost and the Adaptive LASSO performed well in all metrics, and the LASSO and elastic net excelled when evaluating concordance index and F1-score. The Benjamini-Hoschberg and q-value procedures showed volatile performances in controlling the false discovery rate. Some methods performances were greatly affected by differences in the data characteristics. With our extensive numerical study, we have identified the best performing methods for a plethora of data characteristics using informative metrics. This will help cancer researchers in choosing the best approach for their needs when working with genomic data.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
PLOS Computational Biology
1633 papers in training set
Top 4%
8.4%
2
PLOS ONE
4510 papers in training set
Top 22%
8.4%
3
Briefings in Bioinformatics
326 papers in training set
Top 0.7%
6.8%
4
BMC Bioinformatics
383 papers in training set
Top 2%
6.4%
5
Scientific Reports
3102 papers in training set
Top 24%
4.9%
6
BioData Mining
15 papers in training set
Top 0.1%
4.9%
7
Bioinformatics
1061 papers in training set
Top 5%
4.0%
8
Frontiers in Bioinformatics
45 papers in training set
Top 0.1%
4.0%
9
Frontiers in Genetics
197 papers in training set
Top 2%
3.6%
50% of probability mass above
10
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.1%
3.6%
11
Biology Methods and Protocols
53 papers in training set
Top 0.6%
2.1%
12
PeerJ
261 papers in training set
Top 6%
1.9%
13
Journal of Computational Biology
37 papers in training set
Top 0.1%
1.9%
14
Computers in Biology and Medicine
120 papers in training set
Top 2%
1.9%
15
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.2%
1.8%
16
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.7%
17
Artificial Intelligence in Medicine
15 papers in training set
Top 0.4%
1.5%
18
GigaScience
172 papers in training set
Top 2%
1.2%
19
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.2%
20
Expert Systems with Applications
11 papers in training set
Top 0.2%
1.2%
21
Computational and Structural Biotechnology Journal
216 papers in training set
Top 7%
1.1%
22
International Journal of Molecular Sciences
453 papers in training set
Top 12%
1.0%
23
BMC Genomics
328 papers in training set
Top 5%
0.8%
24
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 6%
0.8%
25
Patterns
70 papers in training set
Top 2%
0.7%
26
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 1.0%
0.7%
27
Heliyon
146 papers in training set
Top 7%
0.7%
28
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.6%
29
IEEE Access
31 papers in training set
Top 1%
0.6%
30
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.9%
0.6%