Back

Structured Schemas for LLM-Modeler Collaboration in Quantitative Systems Pharmacology Model Calibration

Eliason, J.; Popel, A. S.

2026-03-09 systems biology
10.64898/2026.03.05.709623 bioRxiv
Show abstract

Quantitative systems pharmacology (QSP) models require calibration data from published literature, yet manual curation produces inconsistent documentation while large language model (LLM) extraction exhibits hallucination and fabrication errors unacceptable for quantitative modeling. We present MAPLE (Model-Aware Parameterization from Literature Evidence), a framework that uses structured validation schemas as a collaboration interface between LLMs and modelers. Two complementary schemas capture calibration data at different scales: one for isolated experiments that constrain individual parameters through simplified forward models, and one for clinical and in vivo endpoints that constrain the full model through species-level observables. Both schemas separate data extraction from modeling decisions, capturing literature values with full provenance in a machine-verifiable form. Targeted validators catch characteristic LLM errors: value-in-snippet matching detects hallucinated values, DOI resolution flags fabricated citations, and code execution catches malformed forward models. We evaluate MAPLE on 87 calibration targets for a pancreatic ductal adenocarcinoma (PDAC) QSP model, using two collaboration modes: batch LLM extraction followed by interactive curation, and interactive extraction where modeler and LLM collaborate in real time. Both modes required substantial modeler input: the modeler changed forward model types in 65% of SubmodelTargets, adjusted prior parameters in 46%, and revised source relevance assessments in all files. Interactively extracted targets embedded modeler effort in the extraction process, producing near-final output. The schemas ensure completeness and enable reproducible, provenance-rich calibration regardless of workflow.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Cell Systems
167 papers in training set
Top 0.8%
12.2%
2
Molecular Systems Biology
142 papers in training set
Top 0.1%
9.9%
3
Nature Methods
336 papers in training set
Top 1%
9.9%
4
Nature Communications
4913 papers in training set
Top 21%
8.9%
5
Genome Medicine
154 papers in training set
Top 1%
4.7%
6
Bioinformatics
1061 papers in training set
Top 6%
3.5%
7
npj Systems Biology and Applications
99 papers in training set
Top 0.6%
3.5%
50% of probability mass above
8
npj Digital Medicine
97 papers in training set
Top 1%
3.5%
9
Nature
575 papers in training set
Top 8%
3.2%
10
Briefings in Bioinformatics
326 papers in training set
Top 2%
3.0%
11
PLOS Computational Biology
1633 papers in training set
Top 13%
2.3%
12
eLife
5422 papers in training set
Top 37%
2.0%
13
Nature Genetics
240 papers in training set
Top 4%
1.8%
14
PLOS ONE
4510 papers in training set
Top 52%
1.7%
15
Nucleic Acids Research
1128 papers in training set
Top 11%
1.7%
16
Nature Machine Intelligence
61 papers in training set
Top 2%
1.7%
17
Nature Medicine
117 papers in training set
Top 2%
1.7%
18
Nature Biotechnology
147 papers in training set
Top 5%
1.3%
19
BMC Bioinformatics
383 papers in training set
Top 5%
1.3%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.3%
21
Communications Biology
886 papers in training set
Top 16%
1.1%
22
Genome Biology
555 papers in training set
Top 6%
0.9%
23
Patterns
70 papers in training set
Top 2%
0.8%
24
Cell Reports
1338 papers in training set
Top 34%
0.7%
25
Cancer Cell
38 papers in training set
Top 2%
0.7%
26
PLOS Biology
408 papers in training set
Top 21%
0.7%
27
Bioinformatics Advances
184 papers in training set
Top 5%
0.6%
28
Scientific Data
174 papers in training set
Top 3%
0.6%
29
iScience
1063 papers in training set
Top 38%
0.6%
30
Communications Medicine
85 papers in training set
Top 2%
0.6%