Back

Changes in prediction modelling in biomedicine- do systematic reviews indicate whether there is any trend towards larger data sets and machine learning methods?

Lusa, L.; Kappenberg, F.; Collins, G. S.; Schmid, M.; Sauerbrei, W.; Rahnenfuehrer, J.

2024-08-10 health informatics
10.1101/2024.08.09.24311759 medRxiv
Show abstract

The number of prediction models proposed in the biomedical literature has been growing year on year. In the last few years there has been an increasing attention to the changes occurring in the prediction modeling landscape. It is suggested that machine learning techniques are becoming more popular to develop prediction models to exploit complex data structures, higher-dimensional predictor spaces, very large number of participants, heterogeneous subgroups, with the ability to capture higher-order interactions. We examine these changes in modelling practices by investigating a selection of systematic reviews on prediction models published in the biomedical literature. We selected systematic reviews published since 2020 which included at least 50 prediction models. Information was extracted guided by the CHARMS checklist. Time trends were explored using the models published since 2005. We identified 8 reviews, which included 1448 prediction models published in 887 papers. The average number of study participants and outcome events increased considerably between 2015 and 2019, but remained stable afterwards. The number of candidate and final predictors did not noticeably increase over the study period, with a few recent studies using very large numbers of predictors. Internal validation and reporting of discrimination measures became more common, but assessing calibration and carrying out external validation were less common. Information about missing values was not reported in about half of the papers, however the use of imputation methods increased. There was no sign of an increase in using of machine learning methods. Overall, most of the findings were heterogeneous across reviews. Our findings indicate that changes in the prediction modeling landscape in biomedicine are less dramatic than expected and that poor reporting is still common; adherence to well established best practice recommendations from the traditional biostatistics literature is still needed. For machine learning best practice recommendations are still missing, whereas such recommendations are available in the traditional biostatistics literature, but adherence is still inadequate.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
26.4%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.2%
12.9%
3
European Journal of Epidemiology
40 papers in training set
Top 0.1%
8.6%
4
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
4.0%
50% of probability mass above
5
Journal of Clinical Epidemiology
28 papers in training set
Top 0.1%
3.7%
6
Scientific Reports
3102 papers in training set
Top 42%
2.9%
7
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.7%
8
BMJ Open
554 papers in training set
Top 8%
2.1%
9
JAMIA Open
37 papers in training set
Top 0.6%
2.1%
10
PLOS ONE
4510 papers in training set
Top 50%
1.9%
11
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.2%
1.9%
12
BMC Medicine
163 papers in training set
Top 3%
1.7%
13
Research Synthesis Methods
20 papers in training set
Top 0.1%
1.7%
14
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.5%
15
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.3%
1.5%
16
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.2%
17
PLOS Digital Health
91 papers in training set
Top 2%
1.2%
18
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.1%
19
BMJ Health & Care Informatics
13 papers in training set
Top 0.7%
1.0%
20
Frontiers in Cardiovascular Medicine
49 papers in training set
Top 2%
1.0%
21
JMIR Medical Informatics
17 papers in training set
Top 1%
0.9%
22
Wellcome Open Research
57 papers in training set
Top 2%
0.8%
23
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.8%
24
PLOS Biology
408 papers in training set
Top 20%
0.7%
25
BMC Biology
248 papers in training set
Top 5%
0.7%
26
JMIR mHealth and uHealth
10 papers in training set
Top 0.4%
0.7%
27
PLOS Medicine
98 papers in training set
Top 5%
0.7%
28
Human Brain Mapping
295 papers in training set
Top 5%
0.5%
29
BMC Research Notes
29 papers in training set
Top 0.9%
0.5%
30
PLOS Computational Biology
1633 papers in training set
Top 28%
0.5%