Back

Graph-Augmented Retrieval for Digital Evidence-Based Medical Synthesis: A Proof-of-Concept Study on Topology-Aware Mechanistic Narrative Generation

Buscemi, P.; Buscemi, F.

2026-02-19 health systems and quality improvement
10.64898/2026.02.18.26346545 medRxiv
Show abstract

BackgroundRetrieval-augmented generation (RAG) frameworks such as RAPID [1] have demonstrated that staged planning and retrieval grounding improve long-form text generation. However, most implementations remain similarity-driven and open-domain, lacking the epistemic safeguards required for biomedical synthesis, where mechanistic completeness, temporal governance, traceability, and explicit gap classification are essential. ObjectiveTo develop and evaluate a topology-aware, graph-augmented retrieval framework for structured biomedical narrative synthesis, and to position it as a domain-constrained evolution of staged RAG aligned with structural principles of digital evidence-based medicine (dEBM). MethodsWe implemented a two-layer architecture operating on a closed, version-controlled corpus of 11,861 peer-reviewed text chunks on iron deficiency. A metadata-constrained vector retriever (RAG01) was extended with a Graph-RAG (RAG02) overlay (RAG02) constructed from chunk-level entity extraction and weighted co-occurrence networks (30 nodes; 118 directed edges). Topic planning was organized through predefined mechanistic axes functioning as structured hypothesis probes. Retrieval was performed under identical deterministic constraints (top-k = 5; cosine threshold = 0.50; publication year [≥] 2023), and graph diagnostics--including local connectivity, induced subgraph density, modular overlap, and multi-hop stability--were used to distinguish retrieval insufficiency from corpus-level evidentiary scarcity. ResultsIn a case study of obesity-associated iron deficiency, the entity network exhibited a centralized regulatory topology with hepcidin as a high-connectivity hub. Axis-based retrieval combined with graph auditing consistently reinforced an inflammation-mediated hepcidin pathway linking obesity to iron deficiency, while alternative mechanisms lacked stable multi-hop embedding. Compared with vector-only retrieval, graph augmentation preserved semantic alignment and increased mean cosine similarity from 0.673 to 0.694 while reducing similarity dispersion (SD 0.056 to 0.035) under identical constraints. Graph activity ratio was 1.00 in the temporally filtered corpus. ConclusionsBy integrating mechanistic axis decomposition, topology-aware auditing, causal scaffolding, and expert-driven iterative refinement, the proposed framework implements selected structural constraints inspired by evidence-based medicine within a controlled digital synthesis environment. The approach advances retrieval-augmented generation beyond similarity-based summarization toward a reproducible model of topology-aware biomedical evidence interrogation with implications for AI-assisted systematic reviews.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
Journal of Biomedical Informatics
45 papers in training set
Top 0.1%
14.5%
2
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
7.3%
3
Journal of Personalized Medicine
28 papers in training set
Top 0.1%
6.4%
4
European Heart Journal - Digital Health
15 papers in training set
Top 0.1%
4.9%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.7%
4.0%
6
Nature
575 papers in training set
Top 7%
3.6%
7
npj Digital Medicine
97 papers in training set
Top 1%
3.6%
8
PLOS Digital Health
91 papers in training set
Top 0.8%
3.3%
9
PLOS ONE
4510 papers in training set
Top 42%
3.1%
50% of probability mass above
10
Scientific Reports
3102 papers in training set
Top 43%
2.8%
11
Communications Medicine
85 papers in training set
Top 0.1%
2.1%
12
PLOS Biology
408 papers in training set
Top 7%
2.1%
13
Journal of Clinical Epidemiology
28 papers in training set
Top 0.2%
1.9%
14
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
15
Scientific Data
174 papers in training set
Top 1%
1.3%
16
Journal of Medical Internet Research
85 papers in training set
Top 3%
1.3%
17
Bioinformatics Advances
184 papers in training set
Top 3%
1.3%
18
Bioinformatics
1061 papers in training set
Top 8%
1.2%
19
JMIRx Med
31 papers in training set
Top 1%
1.2%
20
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.6%
1.1%
21
eBioMedicine
130 papers in training set
Top 3%
1.0%
22
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.0%
23
The Lancet Digital Health
25 papers in training set
Top 0.8%
0.9%
24
Nature Medicine
117 papers in training set
Top 4%
0.9%
25
Advanced Science
249 papers in training set
Top 17%
0.8%
26
Nucleic Acids Research
1128 papers in training set
Top 16%
0.8%
27
Nature Communications
4913 papers in training set
Top 61%
0.8%
28
Database
51 papers in training set
Top 0.8%
0.8%
29
BMC Bioinformatics
383 papers in training set
Top 6%
0.8%
30
Medical Decision Making
10 papers in training set
Top 0.3%
0.8%