Back

Unverified Vendor Claims and Preventable Harms: A Mixed-Methods Longitudinal Independent Audit of Health AI System Performance in Nigeria

Uzochukwu, B. S. C.; Cherima, Y. J.; Enebeli, U. U.; Hassan, B.; Okeke, C. C.; Uzochukwu, A. C.; Omoha, A.; Uzochukwu, K. A.; Kalu, E. I.; Victor, D.; Alih, H. E.; Matinja, L. S.; Rindap, I. T.

2026-03-24 health informatics
10.64898/2026.03.21.26348981 medRxiv
Show abstract

Objective: To independently audit vendor-reported performance claims of health AI systems deployed in Nigeria and assess discrepancies, clinical consequences, equity impacts, and implications for safe AI deployment in low- and middle-income countries. Methods and analysis: We conducted a mixed-methods longitudinal audit (October 2024-March 2026) of six health AI systems (chest X-ray interpretation, TB screening, symptom triage, maternal health risk prediction, patient history intake, and health chatbots) across 73 diverse health facilities in six Nigerian states, involving 52,000 patients and 45 key informant interviews conducted with stakeholders. All data were sourced from integrated facility-level records, and no database linkage was performed. Vendor claims were abstracted from documentation, white papers, and validation studies. Independent performance was verified by an independent third party through system logs, patient records, clinical outcomes, and stakeholder interviews. Performance gaps were quantified as absolute percentage-point differences; clinical harms were estimated using patient volume and bootstrap confidence intervals; equity impacts were assessed across vulnerability dimensions (geography, age, income, comorbidities, infrastructure) using interaction terms in mixed-effects models and an Equity Harm Index (EHI). Results: Vendor-reported accuracy averaged 91.5%, while independently measured real-world accuracy averaged 67.3%, yielding a mean performance gap of 24.2 percentage points (95% CI: 21.5 to 26.9; p<0.001) across systems. Gaps ranged from 17 to 35 percentage points and were statistically significant for all systems. These discrepancies translated to substantial preventable harm, including an estimated 1,247 undetected TB cases (186 preventable deaths) and 342 misclassified high-risk pregnancies annually. Performance gaps were 28-38% larger among vulnerable groups (e.g., rural patients showed 38% higher EHI). Gaps were classified as systematic, context-dependent, or population-dependent. Conclusion: Vendor-reported performance metrics substantially overstated the real-world effectiveness of health AI in Nigeria, leading to preventable patient harm and widening inequities. Mandatory independent post-deployment verification, analogous to pharmaceutical Phase IV surveillance, is essential to ensure safe, equitable AI use in resource-constrained settings. Donors and regulators should prioritize verification over trust-based deployment.

Matching journals

The top 6 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
14.8%
2
PLOS Global Public Health
293 papers in training set
Top 0.6%
14.5%
3
PLOS ONE
4510 papers in training set
Top 18%
10.2%
4
Frontiers in Public Health
140 papers in training set
Top 1%
4.9%
5
BMC Health Services Research
42 papers in training set
Top 0.4%
4.4%
6
BMJ Open
554 papers in training set
Top 5%
4.3%
50% of probability mass above
7
Philosophical Transactions of the Royal Society B
51 papers in training set
Top 1%
4.0%
8
The Lancet Global Health
24 papers in training set
Top 0.3%
3.6%
9
eClinicalMedicine
55 papers in training set
Top 0.1%
3.6%
10
BMJ Global Health
98 papers in training set
Top 1%
2.9%
11
JMIR Public Health and Surveillance
45 papers in training set
Top 1%
2.1%
12
BMJ Health & Care Informatics
13 papers in training set
Top 0.3%
2.1%
13
BMC Medicine
163 papers in training set
Top 3%
1.9%
14
Scientific Reports
3102 papers in training set
Top 53%
1.9%
15
BMC Infectious Diseases
118 papers in training set
Top 3%
1.3%
16
DIGITAL HEALTH
12 papers in training set
Top 0.4%
1.3%
17
International Journal of Drug Policy
11 papers in training set
Top 0.2%
1.2%
18
The Lancet Digital Health
25 papers in training set
Top 0.6%
1.2%
19
Wellcome Open Research
57 papers in training set
Top 2%
1.0%
20
BMJ Paediatrics Open
21 papers in training set
Top 0.7%
0.8%
21
Frontiers in Digital Health
20 papers in training set
Top 1%
0.8%
22
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.8%
23
BMC Public Health
147 papers in training set
Top 6%
0.7%
24
International Journal of Medical Informatics
25 papers in training set
Top 2%
0.7%
25
Nature Communications
4913 papers in training set
Top 65%
0.6%
26
BMJ
49 papers in training set
Top 1%
0.6%
27
Malaria Journal
48 papers in training set
Top 2%
0.6%
28
European Respiratory Journal
54 papers in training set
Top 2%
0.6%
29
The American Journal of Tropical Medicine and Hygiene
60 papers in training set
Top 5%
0.5%
30
BMC Medical Research Methodology
43 papers in training set
Top 2%
0.5%