Back

OpenScientist: evaluating an open agentic AI co-scientist to accelerate biomedical discovery

Roberts, K. F.; Abrams, Z. B.; Cappelletti, L.; Moqri, M.; Heugel, N.; Caufield, J. H.; Bourdenx, M.; Li, Y.; Banerjee, J.; Foschini, L.; Galeano, D.; Harris, N. L.; Li, M.; Ying, K.; Melendez, J. A.; Barthelemy, N. R.; Bollinger, J. G.; He, Y.; Ovod, V.; Benzinger, T. L. S.; Flores, S.; Gordon, B.; Ojewole, A. A.; Phatak, M.; Elbert, D. L.; Biber, S.; Landsness, E. C.; Mungall, C. J.; Bateman, R. J.; Reese, J.

2026-03-18 health informatics
10.64898/2026.03.15.26348338 medRxiv
Show abstract

BackgroundAdvances in medicine depend on analyzing large and complex data sources, but discovery is partly constrained by the limited time and domain expertise of human researchers. Agentic artificial intelligence (agentic AI) can accelerate discovery by automating components of the scientific workflow, including information retrieval, data analysis, and knowledge synthesis. AimOpenScientist, an open-source agentic AI co-scientist, aims to accelerate biomedical discovery by semi-autonomously investigating scientist-defined queries and generating clinically relevant, verifiable scientific insights. MethodsDomain experts evaluated OpenScientist for novel discoveries in four clinical case studies: (1) a prespecified analysis in a community-based Alzheimers disease biomarker cohort, (2) unsupervised modeling for plasma proteomic survival prediction, (3) hypothesis investigation in single-cell transcriptomic data from neurons with neurofibrillary tangles, and (4) hypothesis generation with validation in a multiple myeloma dataset with a randomized negative control. ResultsOpenScientist completed analyses in minutes that otherwise would take weeks to months of human time and expertise. It identified %ptau217 as the best predictor of amyloid PET status, generated a plasma proteomic survival model with performance comparable to published models, proposed a mechanism linking tau pathology to altered lysosomal acidification, and generated multiple myeloma hypotheses that were validated in an external cohort while distinguishing true signal from randomized controls. ConclusionOpenScientist demonstrates that open, auditable, agentic AI can support real-world clinical research by generating hypotheses, executing analyses, and discovering insights from complex datasets.

Matching journals

The top 11 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.4%
12.2%
2
Bioinformatics
1061 papers in training set
Top 4%
6.7%
3
Patterns
70 papers in training set
Top 0.1%
6.3%
4
Scientific Reports
3102 papers in training set
Top 25%
4.8%
5
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.8%
3.5%
6
Computers in Biology and Medicine
120 papers in training set
Top 0.9%
3.5%
7
Artificial Intelligence in the Life Sciences
11 papers in training set
Top 0.1%
3.5%
8
Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring
38 papers in training set
Top 0.5%
3.0%
9
Alzheimer's & Dementia
143 papers in training set
Top 2%
2.6%
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.3%
11
Journal of Biomedical Informatics
45 papers in training set
Top 0.6%
2.3%
50% of probability mass above
12
PLOS ONE
4510 papers in training set
Top 49%
2.1%
13
Frontiers in Digital Health
20 papers in training set
Top 0.5%
1.9%
14
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.2%
1.8%
15
JMIR Medical Informatics
17 papers in training set
Top 0.7%
1.8%
16
BMJ Open
554 papers in training set
Top 9%
1.7%
17
Annals of Neurology
57 papers in training set
Top 1%
1.7%
18
The Lancet Digital Health
25 papers in training set
Top 0.4%
1.7%
19
The Journal of Prevention of Alzheimer's Disease
10 papers in training set
Top 0.2%
1.7%
20
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.7%
21
Alzheimer's Research & Therapy
52 papers in training set
Top 1%
1.6%
22
Frontiers in Aging Neuroscience
67 papers in training set
Top 2%
1.3%
23
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
24
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.2%
25
Biomedicines
66 papers in training set
Top 2%
0.9%
26
GeroScience
97 papers in training set
Top 1%
0.9%
27
Human Brain Mapping
295 papers in training set
Top 4%
0.9%
28
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
29
Artificial Intelligence in Medicine
15 papers in training set
Top 0.6%
0.9%
30
Alzheimer's & Dementia: Translational Research & Clinical Interventions
16 papers in training set
Top 0.7%
0.7%