Back

How Agent Role Structure Alters Operating Characteristics of Large Language Model Clinical Classifiers: A Comparative Study of Specialist and Deliberative Multi-Agent Protocols

Anderson, C. G.

2026-02-24 health informatics
10.64898/2026.02.22.26346818 medRxiv
Show abstract

Large language models (LLMs) are increasingly deployed in structured clinical decision support, yet the architectural effects of internal role decomposition within multi-agent systems remain poorly isolated. Prior comparisons of single-agent and multi-agent prompting frequently confound workflow structure with changes in model configuration, training, or decoding. We present a controlled architectural study of role-structured inference under fixed model parameters, isolating internal role decomposition as the sole manipulated variable. Two deterministic multi-agent protocols, Generic Deliberative (GD) and Feature-Specialist (FS), are evaluated under identical base weights, decoding settings, computational budget, and adjudication logic. Across two tabular clinical benchmarks (UCI Cleveland Heart Disease and Pima Indians Diabetes), altering role structure alone systematically reshapes operating characteristics. On Cleveland, FS improves accuracy by 0.07 and macro-F1 by 0.06 relative to GD, while shifting the operating point toward higher specificity (+0.22) and lower sensitivity (-0.13), substantially reducing false positives. On Pima, architectural effects reverse direction: GD achieves the strongest macro performance (accuracy 0.68, macro-F1 0.64), whereas FS induces pronounced class asymmetry (recall 0.95 for the positive class and 0.27 for the negative class). These findings demonstrate that internal role decomposition functions as a structured inductive bias that can materially alter error distributions without modifying model parameters. Multi-agent prompt architecture should therefore be treated as an explicit mechanism for controlling sensitivity-specificity trade-offs in safety-sensitive LLM decision systems.

Matching journals

The top 5 journals account for 50% of the predicted probability mass.

1
npj Digital Medicine
97 papers in training set
Top 0.2%
22.7%
2
Scientific Reports
3102 papers in training set
Top 6%
10.2%
3
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.1%
8.5%
4
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
6.9%
5
Frontiers in Artificial Intelligence
18 papers in training set
Top 0.1%
4.2%
50% of probability mass above
6
Journal of Biomedical Informatics
45 papers in training set
Top 0.4%
3.7%
7
PLOS Digital Health
91 papers in training set
Top 0.9%
2.9%
8
PLOS ONE
4510 papers in training set
Top 46%
2.5%
9
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.5%
10
Journal of Medical Internet Research
85 papers in training set
Top 2%
2.1%
11
Artificial Intelligence in Medicine
15 papers in training set
Top 0.2%
1.9%
12
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
1.7%
13
Communications Medicine
85 papers in training set
Top 0.2%
1.7%
14
The Lancet Digital Health
25 papers in training set
Top 0.4%
1.7%
15
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 1%
1.3%
16
Nature Communications
4913 papers in training set
Top 55%
1.3%
17
Nature Medicine
117 papers in training set
Top 3%
1.3%
18
Nature Machine Intelligence
61 papers in training set
Top 2%
1.3%
19
JMIR Medical Informatics
17 papers in training set
Top 1%
1.2%
20
JAMIA Open
37 papers in training set
Top 1%
1.2%
21
Computers in Biology and Medicine
120 papers in training set
Top 4%
0.9%
22
International Journal of Medical Informatics
25 papers in training set
Top 1%
0.9%
23
PLOS Computational Biology
1633 papers in training set
Top 22%
0.9%
24
BMC Bioinformatics
383 papers in training set
Top 7%
0.8%
25
Patterns
70 papers in training set
Top 2%
0.8%
26
Frontiers in Bioinformatics
45 papers in training set
Top 1%
0.6%
27
BMJ Health & Care Informatics
13 papers in training set
Top 1%
0.6%
28
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 49%
0.5%
29
Bioinformatics
1061 papers in training set
Top 11%
0.5%
30
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 7%
0.5%