Back

An AI Agent for Automated Causal Inference in Epidemiology

Liu, H.; Shi, K.; li, A.; Li, X.; Chu, J.; Xue, Y.; Cen, S.; Wang, Y.; Zhang, T.

2026-02-06 epidemiology
10.64898/2026.02.06.26345723 medRxiv
Show abstract

ObjectiveTo address the inefficiency, subjectivity, and high expertise barrier of traditional epidemiological causal inference, this study designed, developed, and validated an AI-powered agent (EpiCausalX Agent) to automate the end-to-end workflow. It integrates cross-database literature retrieval, intelligent causal reasoning, and Directed Acyclic Graph (DAG) visualization to provide a reliable, accessible tool for researchers. Materials and MethodsBuilt on the LangChain 1.0 framework with a layered design (Agent/Tool/Storage/Utility Layers), the agent uses the DeepSeek V3.2 LLM and ReAct paradigm for dynamic task orchestration. Four specialized tools were integrated including multi-database retrieval with 7 databases, causal inference based on Hills criteria and DAG logic, automated DAG drawing using NetworkX and Matplotlib, and clinical standard query. Performance was validated via unit tests, workflow verification, and usability testing. ResultsThe agent achieved full-process automation. It efficiently retrieves and synthesizes literature, automatically identifies confounders and mediators, and generates standardized interactive DAGs. It produces evidence-based, traceable conclusions aligned with established epidemiological knowledge. Its user-friendly natural language interface enables seamless use by non-technical researchers who complete task initiation quickly without operational confusion. The agent is publicly available on WeChat Mini Program for easy access. ConclusionEpiCausalX Agent advances intelligent, automated epidemiological research. By integrating domain expertise with AI agent technology, it overcomes limitations of manual methods and general LLMs to provide a specialized, verifiable, efficient solution. It has broad applications in observational research, clinical study design, and education to enhance productivity and lower barriers to rigorous causal analysis.

Matching journals

The top 9 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
9.9%
2
PLOS ONE
4510 papers in training set
Top 24%
7.0%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.5%
6.2%
4
European Journal of Epidemiology
40 papers in training set
Top 0.1%
6.2%
5
International Journal of Epidemiology
74 papers in training set
Top 0.3%
6.2%
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.3%
4.8%
7
Database
51 papers in training set
Top 0.1%
4.1%
8
Nature Communications
4913 papers in training set
Top 38%
3.8%
9
PLOS Biology
408 papers in training set
Top 4%
3.5%
50% of probability mass above
10
International Journal of Medical Informatics
25 papers in training set
Top 0.4%
3.5%
11
npj Digital Medicine
97 papers in training set
Top 1%
3.0%
12
Research Synthesis Methods
20 papers in training set
Top 0.1%
3.0%
13
PLOS Computational Biology
1633 papers in training set
Top 12%
2.6%
14
BMC Medicine
163 papers in training set
Top 3%
2.1%
15
Scientific Reports
3102 papers in training set
Top 51%
2.0%
16
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
17
BMC Medical Informatics and Decision Making
39 papers in training set
Top 2%
1.7%
18
American Journal of Epidemiology
57 papers in training set
Top 0.8%
1.7%
19
Nature Human Behaviour
85 papers in training set
Top 2%
1.7%
20
BMC Infectious Diseases
118 papers in training set
Top 4%
1.2%
21
BMJ Open
554 papers in training set
Top 12%
0.9%
22
Healthcare
16 papers in training set
Top 1%
0.9%
23
Bioinformatics
1061 papers in training set
Top 9%
0.8%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.7%
25
Patterns
70 papers in training set
Top 3%
0.7%
26
Epidemiology
26 papers in training set
Top 0.6%
0.7%
27
Journal of Clinical Epidemiology
28 papers in training set
Top 0.6%
0.7%
28
NAR Genomics and Bioinformatics
214 papers in training set
Top 4%
0.7%
29
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 3%
0.6%