Back

Comparing Physicians' Assessments of Context-specific AI-powered clinical reasoning assistant with General-Purpose AI agent: A Prospective Multi-Site Physician Evaluation of VITA versus ChatGPT in India and Bangladesh

Mandke, C.; Agrawal, H. K.; Bharti, B.; Chansoria, M.; Gupta, G.; Rawat, S. K.; Sarkar, N. K.; Singh, A.; PS, S.; Walia, S.; VALID (Validation of AI in Low-resource and Indian Domains) Consortium,

2026-04-30 health systems and quality improvement
10.64898/2026.04.30.26351194 medRxiv
Show abstract

BackgroundHealthcare providers in low- and middle-income countries (LMICs) are increasingly relying on Artificial Intelligence (AI) tools, yet most available AI assistants are general-purpose systems not designed for the specific clinical, epidemiological, and resource contexts of these settings. There is no evidence, from physicians assessments, on whether clinical reasoning support from purpose-built, context-specific and retrieval-augmented AI tools can outperform general-purpose AI agents. MethodsWe conducted a prospective multi-site validation study enrolling 37 physicians across India and Bangladesh. Each physician evaluated two AI tools (a) VITA (Validated Intelligence for Treatment and Assessment), a purpose-built (context-specific and retrieval-augmented) clinical reasoning AI assistant trained on India-specific guidelines, antimicrobial resistance patterns, and formulary constraints, and (b) ChatGPT Plus (version 5.2), a leading general-purpose AI assistant on six hypothetical clinical case vignettes (three predefined, three physician-selected). Evaluations were scored across six dimensions (differential diagnosis, clinical workup, treatment recommendation, dosing, clinical decision-making, and evidence quality) on a 1-5 Likert scale, yielding 444 observations. Analyses included paired t-tests, Wilcoxon signed-rank tests, and multivariate regressions with robust standard errors. ResultsVITA scored significantly higher than ChatGPT across all six evaluation dimensions. The mean composite score (sum of all dimensions, maximum = 30) was 25.4 for VITA versus 22.3 for ChatGPT (difference = +3.1 points, t = 8.31, p < 0.001). The largest advantage was in evidence quality (VITA: 4.46 vs. ChatGPT: 3.14, a 42% relative gap). VITAs advantage was consistent across both predefined and doctor-defined hypothetical cases and was robust to controls for physician demographics, case type, and evaluation order in multivariate regression (coefficient = +3.08, p < 0.001). ConclusionsIn this first systematic head-to-head physician evaluation of a purpose-built clinical reasoning AI assistant versus general-purpose AI in an LMIC setting, physicians consistently rated the context-specific tool as superior. These findings suggest that contextual relevance--including local guidelines, formulary constraints, and resistance patterns--matters for clinical AI adoption and quality in resource-limited settings.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
14.4%
2
PLOS ONE
4510 papers in training set
Top 17%
10.4%
3
Journal of the American Medical Informatics Association
61 papers in training set
Top 0.4%
7.2%
4
BMC Medical Informatics and Decision Making
39 papers in training set
Top 0.4%
6.4%
5
BMJ Open
554 papers in training set
Top 3%
6.3%
6
BMJ Open Quality
15 papers in training set
Top 0.2%
4.9%
7
BMC Health Services Research
42 papers in training set
Top 0.4%
4.9%
50% of probability mass above
8
BMJ Health & Care Informatics
13 papers in training set
Top 0.2%
3.7%
9
Frontiers in Public Health
140 papers in training set
Top 2%
3.6%
10
Healthcare
16 papers in training set
Top 0.1%
3.6%
11
Frontiers in Digital Health
20 papers in training set
Top 0.4%
2.6%
12
Medical Decision Making
10 papers in training set
Top 0.1%
2.1%
13
JAMA Network Open
127 papers in training set
Top 2%
2.1%
14
BMC Infectious Diseases
118 papers in training set
Top 2%
2.1%
15
JMIRx Med
31 papers in training set
Top 0.5%
1.9%
16
Journal of Medical Internet Research
85 papers in training set
Top 2%
1.9%
17
Journal of Clinical Epidemiology
28 papers in training set
Top 0.3%
1.8%
18
PLOS Global Public Health
293 papers in training set
Top 4%
1.5%
19
npj Digital Medicine
97 papers in training set
Top 3%
1.1%
20
Scientific Reports
3102 papers in training set
Top 69%
0.9%
21
BMJ Global Health
98 papers in training set
Top 2%
0.9%
22
Canadian Medical Association Journal
15 papers in training set
Top 0.3%
0.9%
23
Journal of General Internal Medicine
20 papers in training set
Top 0.8%
0.9%
24
JAC-Antimicrobial Resistance
13 papers in training set
Top 0.4%
0.9%
25
BJPsych Open
25 papers in training set
Top 0.8%
0.7%
26
British Journal of General Practice
22 papers in training set
Top 0.7%
0.6%
27
Cancer Medicine
24 papers in training set
Top 2%
0.6%