Back

Knowledge augmented causal discovery through large language models and knowledge graphs: application in chronic low back pain

Lin, D.; Mussavi Rizi, M.; O'Neill, C.; Lotz, J. C.; Anderson, P.; Torres Espin, A.

2026-02-18 neurology
10.64898/2026.02.13.26346255 medRxiv
Show abstract

Causal discovery algorithms are often leveraged for inferring causal relationships and recovering a causal model from data. However, causal discovery from data alone is limited by the structural constraints of the used dataset, the lack of causal logic, and the lack of external knowledge. Thus, data-driven causal discovery can only suggest possible causal relationships at best. To overcome these limitations, Large Language Models (LLMs) and knowledge systems, such as Retrieval-Augmented Generation (RAG), have been proposed as alternatives to data-driven causal discovery and as a method to augment causal discovery algorithms. Using an expert-defined causal graph of chronic lower back pain, we further propose knowledge graph based RAG systems, such as GraphRAG, as an improvement over RAG systems for augmenting causal discovery (F1 0.745), benchmarking its performance against augmenting causal discovery with an LLM (F1 0.636), augmenting causal discovery with RAG (F1 0.714), and causal discovery alone (F1 0.396). We also explore the impact of different prompting methods for causality, such as querying for the plausibility of causal relationships, the presence of statistical associations, and the existence of temporal causal relationships, as inspired by the methodology of the domain experts constructing our ground truth. Lastly, we discuss how applications of LLMs, RAG, and graph-based RAG systems can impact and accelerate the causal modeling of chronic lower back pain by bridging the gap between domain knowledge and data driven approaches to causal modeling. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=93 SRC="FIGDIR/small/26346255v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@f3387org.highwire.dtl.DTLVardef@2dforg.highwire.dtl.DTLVardef@bc839aorg.highwire.dtl.DTLVardef@63f6ea_HPS_FORMAT_FIGEXP M_FIG C_FIG

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
23.0%
2
Scientific Reports
3102 papers in training set
Top 13%
6.9%
3
npj Digital Medicine
97 papers in training set
Top 0.7%
6.9%
4
PLOS ONE
4510 papers in training set
Top 30%
4.9%
5
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.5%
4.9%
6
Computers in Biology and Medicine
120 papers in training set
Top 0.5%
4.4%
50% of probability mass above
7
BMC Neurology
12 papers in training set
Top 0.1%
3.7%
8
IEEE Access
31 papers in training set
Top 0.1%
3.7%
9
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
2.8%
10
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
2.1%
11
Nature Communications
4913 papers in training set
Top 49%
1.8%
12
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.8%
13
Frontiers in Neuroinformatics
38 papers in training set
Top 0.3%
1.7%
14
iScience
1063 papers in training set
Top 15%
1.7%
15
Computational and Structural Biotechnology Journal
216 papers in training set
Top 5%
1.5%
16
PLOS Computational Biology
1633 papers in training set
Top 18%
1.4%
17
International Journal of Medical Informatics
25 papers in training set
Top 1%
1.2%
18
Cureus
67 papers in training set
Top 4%
1.0%
19
eLife
5422 papers in training set
Top 53%
0.9%
20
GigaScience
172 papers in training set
Top 3%
0.8%
21
Computer Methods and Programs in Biomedicine
27 papers in training set
Top 0.9%
0.8%
22
BMC Medical Research Methodology
43 papers in training set
Top 1%
0.8%
23
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
24
JMIR Formative Research
32 papers in training set
Top 2%
0.7%
25
Frontiers in Digital Health
20 papers in training set
Top 1%
0.7%
26
Frontiers in Aging Neuroscience
67 papers in training set
Top 4%
0.7%
27
Frontiers in Neuroscience
223 papers in training set
Top 8%
0.7%
28
Journal of Personalized Medicine
28 papers in training set
Top 2%
0.7%
29
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 47%
0.7%
30
Frontiers in Psychiatry
83 papers in training set
Top 4%
0.7%