Knowledge augmented causal discovery through large language models and knowledge graphs: application in chronic low back pain
Lin, D.; Mussavi Rizi, M.; O'Neill, C.; Lotz, J. C.; Anderson, P.; Torres Espin, A.
Show abstract
Causal discovery algorithms are often leveraged for inferring causal relationships and recovering a causal model from data. However, causal discovery from data alone is limited by the structural constraints of the used dataset, the lack of causal logic, and the lack of external knowledge. Thus, data-driven causal discovery can only suggest possible causal relationships at best. To overcome these limitations, Large Language Models (LLMs) and knowledge systems, such as Retrieval-Augmented Generation (RAG), have been proposed as alternatives to data-driven causal discovery and as a method to augment causal discovery algorithms. Using an expert-defined causal graph of chronic lower back pain, we further propose knowledge graph based RAG systems, such as GraphRAG, as an improvement over RAG systems for augmenting causal discovery (F1 0.745), benchmarking its performance against augmenting causal discovery with an LLM (F1 0.636), augmenting causal discovery with RAG (F1 0.714), and causal discovery alone (F1 0.396). We also explore the impact of different prompting methods for causality, such as querying for the plausibility of causal relationships, the presence of statistical associations, and the existence of temporal causal relationships, as inspired by the methodology of the domain experts constructing our ground truth. Lastly, we discuss how applications of LLMs, RAG, and graph-based RAG systems can impact and accelerate the causal modeling of chronic lower back pain by bridging the gap between domain knowledge and data driven approaches to causal modeling. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=93 SRC="FIGDIR/small/26346255v1_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@f3387org.highwire.dtl.DTLVardef@2dforg.highwire.dtl.DTLVardef@bc839aorg.highwire.dtl.DTLVardef@63f6ea_HPS_FORMAT_FIGEXP M_FIG C_FIG
Matching journals
The top 6 journals account for 50% of the predicted probability mass.