Back

Technical Acquisition Parameters Dominate Demographic Factors in Chest X-ray AI Performance Disparities: A Multi-Dataset External Validation Study

Farquhar, H. L.

2026-01-22 radiology and imaging
10.64898/2026.01.20.26344495 medRxiv
Show abstract

Artificial intelligence systems for chest radiograph interpretation are increasingly deployed in clinical practice, yet current fairness frameworks emphasize demographic subgroup analysis while the relative contribution of technical acquisition parameters to performance disparities remains poorly characterized. We conducted a multi-dataset validation study analyzing 138,804 chest radiographs from the RSNA Pneumonia Detection Challenge (n=26,684; 22.5% pneumonia prevalence) and NIH ChestX-ray14 (n=112,120; 1.3% prevalence) using five pre-trained DenseNet-121 models. We calculated sensitivity, specificity, and area under the receiver operating characteristic curve stratified by view type (anteroposterior versus posteroanterior), age group, and sex, with performance disparity analysis quantifying each factors contribution to performance variation. View type dominated total observed performance range in both datasets: 87% in RSNA and 69% in NIH. All five models demonstrated systematic posteroanterior view underdiagnosis with miss rates of 30-78%. The odds ratio for missed diagnosis on posteroanterior versus anteroposterior views was 6.69 (95% CI: 5.79-7.72) in RSNA and 13.02 (95% CI: 11.62-14.59) in NIH. Analysis of 131,361 disease-free images demonstrated that view-type effects persist strongly even without disease (Cohens d = 1.19-1.33), providing compelling evidence against the hypothesis that observed disparities reflect disease severity confounding rather than learned image characteristics. Age explained 5-30% of the total observed performance range depending on dataset, while sex consistently explained less than 2%. Technical acquisition parameters, specifically radiograph view type, dominate performance disparities in chest X-ray AI substantially exceeding demographic factor contributions. These findings have immediate implications for regulatory frameworks: future FDA and EU AI Act guidance should explicitly mandate acquisition parameter auditing alongside demographic subgroup analysis. Author SummaryArtificial intelligence systems that interpret chest X-rays are being used in hospitals worldwide. There has been important work examining whether these systems perform fairly across different patient groups--for example, whether they work equally well for men and women, or for patients of different ages and races. We asked a different question: does the way the X-ray was taken affect how well AI systems perform? We found that the technical method used to acquire the image--specifically, whether the X-ray beam was directed from back to front (posteroanterior view, typical in outpatient settings) or front to back (anteroposterior view, typical in emergency and inpatient settings)--explained 69-87% of the variation in AI performance. In contrast, age explained only 5-30% and sex less than 2%. Most concerning, AI systems missed 30-78% of pneumonia cases in standing patients across all five systems we tested. This matters because current regulations focus on checking AI performance across demographic groups but do not require checking performance across technical acquisition parameters. Our findings suggest regulators and hospitals should audit how AI systems perform on different types of X-ray images, not just different types of patients.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
PLOS Digital Health
91 papers in training set
Top 0.1%
18.4%
2
The Lancet Digital Health
25 papers in training set
Top 0.1%
14.2%
3
Scientific Reports
3102 papers in training set
Top 3%
14.2%
4
PLOS ONE
4510 papers in training set
Top 29%
6.2%
50% of probability mass above
5
PLOS Computational Biology
1633 papers in training set
Top 9%
3.6%
6
Nature Communications
4913 papers in training set
Top 40%
3.5%
7
JCO Clinical Cancer Informatics
18 papers in training set
Top 0.2%
3.5%
8
npj Digital Medicine
97 papers in training set
Top 1%
3.0%
9
GigaScience
172 papers in training set
Top 0.7%
2.8%
10
Nature Medicine
117 papers in training set
Top 2%
2.1%
11
Patterns
70 papers in training set
Top 0.7%
1.9%
12
eLife
5422 papers in training set
Top 40%
1.8%
13
Nature Machine Intelligence
61 papers in training set
Top 2%
1.8%
14
JAMA Network Open
127 papers in training set
Top 2%
1.7%
15
European Radiology
14 papers in training set
Top 0.4%
1.5%
16
Computers in Biology and Medicine
120 papers in training set
Top 3%
1.3%
17
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 37%
1.3%
18
eBioMedicine
130 papers in training set
Top 3%
0.9%
19
Medical Physics
14 papers in training set
Top 0.6%
0.8%
20
Journal of Medical Imaging
11 papers in training set
Top 0.3%
0.8%
21
iScience
1063 papers in training set
Top 30%
0.8%
22
BMJ Open
554 papers in training set
Top 13%
0.7%
23
Communications Medicine
85 papers in training set
Top 1%
0.7%
24
Expert Systems with Applications
11 papers in training set
Top 0.5%
0.7%
25
IEEE Access
31 papers in training set
Top 1%
0.6%
26
Diagnostics
48 papers in training set
Top 3%
0.6%