Back

Variable selection for competing risk regression models: recommendations for analyzing data from epidemiological studies

Mullaert, J.; Schmeller, S.; Austin, P. C.; Latouche, A.

2024-11-26 epidemiology
10.1101/2024.11.25.24317882 medRxiv
Show abstract

When fitting competing risks regression models, a variety of variable selection methods exist, including backward selection on the subdistribution hazard, on the cause-specific hazards, and penalized methods. However, a benchmark study comparing these different procedures is lacking. We conducted an extensive simulation study to compare three variable selection procedures in terms of both model selection ability and predictive accuracy. 5120 datasets were simulated in various conditions aiming at being representative of real applications in clinical epidemiology. Results show that the backward selection procedure can lead to high false discovery rate (FDR) because of implementation choices. Even for scenarios with a high numbers of events per variable (EPV), the true model is rarely identified by any of the tested procedures. Survival predictions were assessed with time-dependent AUC and show similar performances for all methods. We also provide an application on real data from stem cell transplanted patients in hematology. We conclude that the identification of the true model in competing risk regression is a very difficult task, and suggest some recommendations to analysts: (1) to report event per variable for the event type of interest and (2) to use multiple methods to deal with model uncertainty and avoid implementation pitfalls.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
40.1%
2
PLOS Computational Biology
1633 papers in training set
Top 3%
10.3%
50% of probability mass above
3
PLOS ONE
4510 papers in training set
Top 25%
6.9%
4
Statistics in Medicine
34 papers in training set
Top 0.1%
6.9%
5
Scientific Reports
3102 papers in training set
Top 22%
4.9%
6
American Journal of Epidemiology
57 papers in training set
Top 0.3%
3.7%
7
BMC Bioinformatics
383 papers in training set
Top 4%
2.1%
8
Epidemiology
26 papers in training set
Top 0.2%
1.8%
9
Epidemiology and Infection
84 papers in training set
Top 1%
1.7%
10
Cancers
200 papers in training set
Top 4%
1.0%
11
European Journal of Epidemiology
40 papers in training set
Top 0.5%
1.0%
12
PeerJ
261 papers in training set
Top 12%
0.9%
13
Physical Biology
43 papers in training set
Top 2%
0.9%
14
Biology Methods and Protocols
53 papers in training set
Top 2%
0.9%
15
Journal of The Royal Society Interface
189 papers in training set
Top 4%
0.8%
16
Bioinformatics
1061 papers in training set
Top 9%
0.8%
17
BioData Mining
15 papers in training set
Top 0.8%
0.8%
18
Blood
67 papers in training set
Top 1%
0.7%
19
International Journal of Molecular Sciences
453 papers in training set
Top 16%
0.7%
20
BMC Research Notes
29 papers in training set
Top 0.6%
0.7%
21
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
22
Genetic Epidemiology
46 papers in training set
Top 1%
0.5%
23
Journal of Clinical Medicine
91 papers in training set
Top 8%
0.5%