Back

Artificial Intelligence Agents in Mental Health: A Systematic Review and Meta Analysis

Zhu, L.; Wang, W.; Liang, Z.; Tan, W.; Chen, B.; Lin, X.; Wu, Z.; Yu, H.; Li, X.; Jiao, J.; He, S.; Dai, G.; Niu, J.; Zhong, Y.; Hua, W.; Chan, N. Y.; Lu, L.; Wing, Y. K.; Ma, X.; Fan, L.

2026-04-22 psychiatry and clinical psychology
10.64898/2026.04.21.26351365 medRxiv
Show abstract

The rapid rise of large language models (LLMs) and foundation models has accelerated efforts to build artificial intelligence (AI) agents for mental health assessment, triage, psychotherapy support and clinical decision assistance. Yet a gap persists between healthcare and AI-focused work: while both communities use the language of "agents," clinical research largely describes monolithic chatbots, whereas AI studies emphasize agentic properties such as autonomous planning, multiagent coordination, tool and database use and integration with multimodal mental health data streams. In this Review, we conduct a systematic analysis of mental health AI agent systems from 2023 to 2025 using a six-dimensional audit framework: (i) system type (base model lineage, interface modality and workflow composition, from rule-based tools to role-aware multi-agent foundation-model systems), (ii) data scope (modalities and provenance, from elicited self-report and chatbot dialogues to electronic health records, biosensing and synthetic corpora), (iii) mental health focus (mapped to ICD-11 diagnostic groupings), (iv) demographics (age strata, geography and sex representation), (v) downstream tasks (screening/triage, clinical decision support, therapeutic interventions, documentation, ethical-legal support and education/simulation) and (vi) evaluation types (automated metrics, language quality benchmarks, safety stress tests, expert review and clinician or patient involvement). Across this corpus, we find that most systems (1) concentrate on depression, anxiety and suicidality, with sparse coverage of severe mental illness, neurocognitive disorders, substance use and complex comorbidity; (2) rely heavily on text-based self-report rather than clinically verified longitudinal data or genuinely multimodal inputs; (3) are implemented as single-agent chatbots powered by general-purpose LLMs rather than role-structured, workflow-integrated pipelines; and (4) are evaluated primarily via offline metrics or vignette-based scenarios, with few prospective, clinician- or patient-in-the-loop studies. At the same time, an emerging class of agentic systems assigns foundation models explicit roles as planners, retrieval agents, safety auditors or supervisors coordinating other models and tools. These multiagent, tool-augmented workflows promise personalization, safety monitoring and greater transparency, but they also introduce new risks around reliability, bias amplification, privacy, regulatory accountability and the blurring of clinical versus non-clinical roles. We conclude by outlining priorities for the next generation of mental health AI agents: clinically grounded, role-aware multi-agent architectures; transparent and privacy-preserving use of clinical and elicited data; demographic and cultural broadening beyond predominantly Western adult samples; and evaluation pipelines that progress from offline benchmarks to longitudinal, real-world studies with routine safety auditing and clear governance of responsibilities between agents and human clinicians.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.1%
27.4%
2
Nature Medicine
117 papers in training set
Top 0.1%
27.4%
50% of probability mass above
3
PLOS ONE
4510 papers in training set
Top 33%
4.6%
4
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
4.6%
5
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.5%
6
Frontiers in Psychiatry
83 papers in training set
Top 1%
2.2%
7
Nature
575 papers in training set
Top 9%
2.2%
8
Scientific Reports
3102 papers in training set
Top 48%
2.2%
9
eLife
5422 papers in training set
Top 37%
2.0%
10
PLOS Digital Health
91 papers in training set
Top 1%
2.0%
11
Communications Medicine
85 papers in training set
Top 0.3%
1.6%
12
PLOS Computational Biology
1633 papers in training set
Top 17%
1.6%
13
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.4%
14
Communications Psychology
20 papers in training set
Top 0.2%
1.3%
15
Nature Communications
4913 papers in training set
Top 58%
1.0%
16
Computational Psychiatry
12 papers in training set
Top 0.1%
0.9%
17
Nature Neuroscience
216 papers in training set
Top 7%
0.7%
18
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
19
Science
429 papers in training set
Top 22%
0.5%
20
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 48%
0.5%
21
The Lancet Digital Health
25 papers in training set
Top 1%
0.5%
22
Frontiers in Artificial Intelligence
18 papers in training set
Top 1.0%
0.5%
23
Nature Human Behaviour
85 papers in training set
Top 5%
0.5%
24
Healthcare
16 papers in training set
Top 2%
0.5%
25
BioData Mining
15 papers in training set
Top 1%
0.5%