Back

Large-Scale Assessment of Animal-to-Human Drug Translation Using Natural Language Processing

Doneva, S. E.; Ellendorff, T. R.; Schneider, G.; Held, L.; von Wyl, V.; Simpson, I.; Sick, B.; Ineichen, B. V.

2026-05-22 bioinformatics
10.64898/2026.05.20.726540 bioRxiv
Show abstract

BackgroundLarge-scale estimates of animal-to-human drug translation and the study characteristics associated with successful translation remain limited. The expanding preclinical literature also challenges manual evidence synthesis. We developed a natural language processing (NLP) pipeline to structure and link preclinical and clinical evidence at scale. MethodsIn this retrospective meta-research study, we analysed more than 500,000 neuroscience-related animal drug studies from PubMed and linked them to clinical trial and regulatory approval data. NLP methods extracted drug, disease, and experimental design characteristics from abstracts and full texts. Translation was defined as progression to completed phase III/IV trials or regulatory approval. Logistic regression assessed associations between preclinical study characteristics and successful translation. FindingsAmong 291,624 drug entities identified in animal studies, 6{middle dot}7% entered clinical development and 3{middle dot}1% reached phase III/IV trials or regulatory approval. At the drug-disease level, 4{middle dot}4% entered clinical development and 1{middle dot}9% achieved translation. Restricting analyses to successfully linked ontology entities increased estimates to 11{middle dot}3% and 4{middle dot}1%, respectively. Male-only animal studies predominated, whereas reporting of randomisation, blinding, and sample size calculations remained limited. Testing across multiple species and reporting blinding were associated with higher odds of successful translation. InterpretationOnly a minority of interventions tested in animals progress to advanced clinical development or regulatory approval. Greater species diversity and blinding were associated with improved translational success. NLP-based evidence synthesis may support scalable evaluation of translational research and identification of potentially modifiable research practices. FundingSwiss National Science Foundation, UZH Digital Entrepreneurship Fellowship, Universities Federation for Animal Welfare. Research in contextO_ST_ABSEvidence before this studyC_ST_ABSWe searched the literature for studies quantifying large-scale animal-to-human translation and factors associated with successful translation. Existing work was mainly limited to specific diseases, interventions, or manually curated datasets, and large-scale linkage of animal and clinical evidence remained limited. Added value of this studyWe developed a natural language processing pipeline linking more than 500,000 animal studies to clinical trial and regulatory approval data. The study provides large-scale estimates of translation and identifies experimental characteristics associated with successful translation. Implications of all the available evidenceThe findings suggest that only a minority of interventions tested in animals progress to advanced clinical development or regulatory approval. Greater species diversity and reporting of blinding were associated with improved translation. Automated evidence synthesis may support more systematic evaluation of translational research practices.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
Trials
25 papers in training set
Top 0.1%
27.2%
2
PLOS ONE
4510 papers in training set
Top 14%
13.0%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
5.1%
4
Scientific Data
174 papers in training set
Top 0.4%
4.1%
5
Bioinformatics
1061 papers in training set
Top 5%
3.8%
50% of probability mass above
6
Scientific Reports
3102 papers in training set
Top 40%
3.2%
7
Nature Communications
4913 papers in training set
Top 43%
2.9%
8
BMC Bioinformatics
383 papers in training set
Top 4%
2.2%
9
Journal of Clinical Epidemiology
28 papers in training set
Top 0.2%
2.0%
10
F1000Research
79 papers in training set
Top 1%
1.8%
11
BMJ Open
554 papers in training set
Top 10%
1.4%
12
Research Synthesis Methods
20 papers in training set
Top 0.1%
1.4%
13
PLOS Biology
408 papers in training set
Top 13%
1.3%
14
PeerJ
261 papers in training set
Top 9%
1.3%
15
Clinical and Translational Science
21 papers in training set
Top 0.7%
1.0%
16
Clinical Infectious Diseases
231 papers in training set
Top 4%
1.0%
17
Neuroscience & Biobehavioral Reviews
43 papers in training set
Top 0.7%
1.0%
18
Frontiers in Veterinary Science
30 papers in training set
Top 0.6%
0.9%
19
Database
51 papers in training set
Top 0.8%
0.8%
20
The Lancet Digital Health
25 papers in training set
Top 0.9%
0.8%
21
Peer Community Journal
254 papers in training set
Top 3%
0.8%
22
Bioinformatics Advances
184 papers in training set
Top 4%
0.8%
23
PLOS Medicine
98 papers in training set
Top 4%
0.8%
24
JAMA Network Open
127 papers in training set
Top 4%
0.8%
25
FACETS
11 papers in training set
Top 0.3%
0.8%
26
British Journal of Clinical Pharmacology
21 papers in training set
Top 0.6%
0.8%
27
BMC Medicine
163 papers in training set
Top 7%
0.8%
28
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 44%
0.8%
29
Journal of Translational Medicine
46 papers in training set
Top 3%
0.8%
30
BMC Biology
248 papers in training set
Top 4%
0.8%