Back

Clinical Advancement Forecasting

Czech, E. A.; Wojdyla, R. S.; Himmelstein, D. S.; Frank, D. H.; Miller, N. A.; Milwid, J. M.; Kolom, A.; Hammerbacher, J.

2024-08-03 genetic and genomic medicine
10.1101/2024.08.02.24311422 medRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWChoosing which drug targets to pursue for a given disease is one of the most impactful decisions made in the global development of new medicines. This study examines the extent to which the outcomes of clinical trials can be predicted based on a small set of longitudinal (temporally labeled) evidence and properties of drug targets and diseases. We demonstrate a novel statistical learning framework for identifying the top 2% of target-disease pairs that are as much as 4-5x more likely to advance beyond phase 2 trials. This framework is 1.5-2x more effective than an Open Targets composite score based on the same set of evidence. It is also 2x more effective than a common measure for genetic support that has been observed previously, as well as in this study, to confer a 2x higher likelihood of success. Utilizing a subset of our biomedical evidence base, non-negative linear models resulting from this framework can produce simple weighting schemes across various types of human, animal, and cell model genomic, transcriptomic, proteomic, and clinical evidence to identify previously undeveloped target-disease pairs poised for clinical success. In this study we further explore: i) how longitudinal treatment of evidence relates to leakage and reverse causality in biomedical research and how temporalized evidence can mitigate common forms of potential biases and inflation ii) the relative impact of different types of features on our predictions; and iii) an analysis of the space of currently undeveloped, tractable targets predicted with these methods to have the highest likelihood of clinical success. To ease reproduction and deployment, no data is used outside of Open Targets and the described methods require no expert knowledge, and can support expansion of lines of evidence to further improve performance.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.1%
21.9%
2
Bioinformatics
1061 papers in training set
Top 2%
13.9%
3
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 4%
12.1%
4
Cell Systems
167 papers in training set
Top 3%
4.2%
50% of probability mass above
5
Bioinformatics Advances
184 papers in training set
Top 0.9%
4.2%
6
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
4.2%
7
eLife
5422 papers in training set
Top 28%
3.5%
8
Scientific Reports
3102 papers in training set
Top 42%
3.0%
9
npj Digital Medicine
97 papers in training set
Top 2%
2.0%
10
PLOS ONE
4510 papers in training set
Top 51%
1.8%
11
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
12
Nature Human Behaviour
85 papers in training set
Top 2%
1.6%
13
Nature Communications
4913 papers in training set
Top 52%
1.6%
14
Cell Genomics
162 papers in training set
Top 4%
1.6%
15
iScience
1063 papers in training set
Top 16%
1.6%
16
Frontiers in Genetics
197 papers in training set
Top 6%
1.4%
17
BMC Genomics
328 papers in training set
Top 4%
1.2%
18
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.9%
19
Heliyon
146 papers in training set
Top 5%
0.9%
20
Patterns
70 papers in training set
Top 2%
0.9%
21
PLOS Computational Biology
1633 papers in training set
Top 23%
0.9%
22
GigaScience
172 papers in training set
Top 3%
0.7%
23
Genome Medicine
154 papers in training set
Top 8%
0.7%
24
Computers in Biology and Medicine
120 papers in training set
Top 5%
0.7%
25
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.7%
26
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.7%
0.7%
27
BMC Bioinformatics
383 papers in training set
Top 8%
0.6%