Back

Forecasting novel therapeutic development in biomedical research

Arabi, S.; Hutchins, B. I.

2026-06-01 scientific communication and education
10.64898/2026.05.29.728775 bioRxiv
Show abstract

Early identification of promising drug research topics is challenging yet crucial for the scientific community to accelerate the development of novel therapeutics. In this work, we leverage large-scale public data from the biomedical literature to extract predictive features to identify promising therapeutic research topics at an early stage. We divide the global citation graph of biomedical literature into a time series of research topics and extract topic features based on citation activity, publication content, and measurable flocking of scientists into novel research topics. Based on these features, our machine learning model identifies research topics that in the future yield Food and Drug Administration (FDA)-approved drugs years before approval (F1-score of 0.84). 80% of target drugs are predicted in advance, with 65% predicted 8 or more years before approval. This predates the start of phase 2 clinical trials in the vast majority of positive predictions. These results show this approach can efficiently flag research topics generating approved drugs several years prior to approval using public data that would have been contemporaneous at the time of prediction. Thus, reliable forecasting can be accomplished with a high-level view of the publication and citation behavior of scientists, without depending on clinical trial data that may only be deposited with a significant lag. This demonstrates that it is possible to detect early signals of future FDA approved therapies even without any specialized information about these applied research efforts. TeaserLarge-scale data analysis can use the full set of scientific citations to predict which areas of research will yield new FDA approved drugs, years in advance.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Patterns
70 papers in training set
Top 0.1%
12.1%
2
PLOS ONE
4510 papers in training set
Top 25%
6.7%
3
Bioinformatics
1061 papers in training set
Top 4%
6.7%
4
Heliyon
146 papers in training set
Top 0.1%
6.2%
5
Genomics, Proteomics & Bioinformatics
171 papers in training set
Top 1%
4.8%
6
Journal of Cheminformatics
25 papers in training set
Top 0.1%
4.1%
7
Scientific Reports
3102 papers in training set
Top 38%
3.5%
8
Entropy
20 papers in training set
Top 0.1%
3.5%
9
Communications Biology
886 papers in training set
Top 2%
3.5%
50% of probability mass above
10
eLife
5422 papers in training set
Top 27%
3.5%
11
PLOS Computational Biology
1633 papers in training set
Top 11%
3.0%
12
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
2.8%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.9%
2.7%
14
BioData Mining
15 papers in training set
Top 0.1%
2.5%
15
Advanced Science
249 papers in training set
Top 8%
2.3%
16
Briefings in Bioinformatics
326 papers in training set
Top 4%
1.7%
17
Nature Machine Intelligence
61 papers in training set
Top 2%
1.6%
18
Nature Biotechnology
147 papers in training set
Top 5%
1.6%
19
Cell Systems
167 papers in training set
Top 8%
1.5%
20
npj Digital Medicine
97 papers in training set
Top 3%
1.3%
21
Journal of Cell Biology
333 papers in training set
Top 3%
1.3%
22
Nature Communications
4913 papers in training set
Top 58%
1.1%
23
IEEE Transactions on Computational Biology and Bioinformatics
17 papers in training set
Top 0.4%
0.9%
24
Nature Human Behaviour
85 papers in training set
Top 3%
0.9%
25
IEEE/ACM Transactions on Computational Biology and Bioinformatics
32 papers in training set
Top 0.5%
0.9%
26
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
27
Medical Image Analysis
33 papers in training set
Top 1.0%
0.8%
28
PLOS Biology
408 papers in training set
Top 19%
0.8%
29
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
30
Nucleic Acids Research
1128 papers in training set
Top 18%
0.7%