Back

Prediction of transformative breakthroughs in biomedical research

Davis, M. T.; Busse, B. L.; Arabi, S.; Meyer, P.; Hoppe, T. A.; Meseroll, R. A.; Hutchins, B. I.; Willis, K. A.; Santangelo, G. M.

2025-12-17 scientific communication and education
10.64898/2025.12.16.694385 bioRxiv
Show abstract

The ability to predict scientific breakthroughs at scale would accelerate the pace of discovery and improve the efficiency of research investments. Recent advances in artificial intelligence, graph theory, and computing power have provided new ways to pursue this elusive goal. We have identified a common signature within co-citation networks that accurately predicts the occurrence of breakthroughs in medical research, on average more than 5 years in advance of the subsequent publication(s) that announced the discovery. A combination of features produces these diagnostic signals: a burst of papers exploring a novel scientific concept, an unusually high number of very influential papers in specialty journals, and low topical cohesion of the associated content. We analyzed two different periods separated by 20 years to show that the kinetics of breakthrough formation are conserved, suggesting that our approach can be used to predict which topics will produce future transformative discoveries. Significance statementScientific breakthroughs are rare, as is contemporaneous recognition of their initial expression. Faster, more efficient identification of topics likely to produce future breakthroughs would speed scientific and technological progress. We introduce an AI/ML-detected signature in co-citation networks that recognizes such topics up to twelve years before the breakthrough itself occurs. Our findings illustrate how a better understanding of the scientific process may lead to greater scientific returns.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
PLOS Biology
408 papers in training set
Top 0.1%
27.8%
2
Nature Neuroscience
216 papers in training set
Top 0.3%
14.4%
3
eLife
5422 papers in training set
Top 5%
10.1%
50% of probability mass above
4
PLOS Computational Biology
1633 papers in training set
Top 5%
6.8%
5
Science
429 papers in training set
Top 5%
6.3%
6
Nature Biotechnology
147 papers in training set
Top 2%
3.6%
7
Nature Methods
336 papers in training set
Top 3%
3.6%
8
Nature Genetics
240 papers in training set
Top 3%
2.9%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 29%
1.9%
10
Journal of Cell Biology
333 papers in training set
Top 2%
1.7%
11
Cell
370 papers in training set
Top 12%
1.5%
12
Nature
575 papers in training set
Top 12%
1.5%
13
Nature Human Behaviour
85 papers in training set
Top 3%
1.2%
14
Communications Biology
886 papers in training set
Top 14%
1.2%
15
PLOS ONE
4510 papers in training set
Top 60%
1.2%
16
Neuron
282 papers in training set
Top 7%
1.0%
17
Science Advances
1098 papers in training set
Top 26%
0.9%
18
Patterns
70 papers in training set
Top 2%
0.9%
19
Cell Systems
167 papers in training set
Top 11%
0.8%
20
Nature Communications
4913 papers in training set
Top 63%
0.8%
21
Molecular Cell
308 papers in training set
Top 10%
0.7%
22
Genome Biology
555 papers in training set
Top 8%
0.7%
23
Nucleic Acids Research
1128 papers in training set
Top 20%
0.6%
24
Molecular Systems Biology
142 papers in training set
Top 2%
0.6%
25
Current Biology
596 papers in training set
Top 15%
0.6%
26
Genome Medicine
154 papers in training set
Top 9%
0.6%
27
eneuro
389 papers in training set
Top 11%
0.5%
28
Brain
154 papers in training set
Top 6%
0.5%