Back

Retrospective evaluation of human genetic evidence for clinical trial success using Mendelian randomization and machine learning

Ravarani, C. N. J.; Arend, M.; Baukmann, H. A.; Cope, J. L.; Lamparter, M. R. J.; Sullivan, J. K.; Fudim, R.; Bender, A.; Malarstig, A.; Schmidt, M. F.

2026-03-14 pharmacology and therapeutics
10.64898/2026.02.19.26346536 medRxiv
Show abstract

Human genetics has become a cornerstone of drug target discovery, yet the value of Mendelian randomization (MR) for predicting clinical success remains uncertain. Here, we systematically evaluated MR across 11,482 target-indication pairs with documented Phase II clinical outcomes to assess its utility for drug development. We find that MR statistical significance alone does not enrich for Phase II success, in contrast to genome-wide association study (GWAS) support, which confers an increase in success probability. However, this apparent limitation reflects the heterogeneous nature of clinical failure and the fact that MR encodes information beyond P values. When MR-derived features, including instrument strength and explained variance, are integrated into machine learning models, predictive performance improves substantially. An MR-informed XGBoost classifier identifies target-indication pairs with a 55% overall approval rate, corresponding to a 6.4-fold enrichment over unstratified programs and a 2.8-fold improvement over GWAS- supported targets in Phase II. Notably, this enrichment is achieved without reliance on statistically significant MR results. Our findings demonstrate that MR is most informative when treated as a graded, context-dependent source of causal evidence rather than a binary hypothesis test, and that its integration with machine learning enables scalable, genetics-informed prioritization of drug targets across the clinical pipeline.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
The American Journal of Human Genetics
206 papers in training set
Top 0.1%
23.1%
2
Nature
575 papers in training set
Top 0.7%
23.1%
3
Nature Communications
4913 papers in training set
Top 28%
6.5%
50% of probability mass above
4
Cell Systems
167 papers in training set
Top 2%
5.0%
5
eLife
5422 papers in training set
Top 24%
3.7%
6
Clinical Pharmacology & Therapeutics
25 papers in training set
Top 0.2%
2.4%
7
Nature Human Behaviour
85 papers in training set
Top 2%
2.1%
8
Nature Genetics
240 papers in training set
Top 4%
1.7%
9
Clinical and Translational Science
21 papers in training set
Top 0.4%
1.7%
10
Science
429 papers in training set
Top 14%
1.7%
11
PLOS ONE
4510 papers in training set
Top 52%
1.7%
12
Scientific Reports
3102 papers in training set
Top 57%
1.7%
13
Genome Medicine
154 papers in training set
Top 4%
1.7%
14
npj Genomic Medicine
33 papers in training set
Top 0.4%
1.5%
15
Cell Genomics
162 papers in training set
Top 4%
1.5%
16
Nature Biomedical Engineering
42 papers in training set
Top 1%
1.4%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 36%
1.4%
18
Science Translational Medicine
111 papers in training set
Top 3%
1.4%
19
The Lancet Infectious Diseases
71 papers in training set
Top 2%
1.1%
20
Nature Machine Intelligence
61 papers in training set
Top 3%
0.9%
21
Molecular Therapy
71 papers in training set
Top 3%
0.8%
22
Communications Medicine
85 papers in training set
Top 1%
0.7%
23
npj Systems Biology and Applications
99 papers in training set
Top 3%
0.7%
24
Molecular Systems Biology
142 papers in training set
Top 2%
0.7%
25
Nature Cancer
35 papers in training set
Top 2%
0.7%
26
Communications Biology
886 papers in training set
Top 28%
0.7%
27
Cancer Cell
38 papers in training set
Top 2%
0.5%
28
PLOS Genetics
756 papers in training set
Top 18%
0.5%