Technical Acquisition Parameters Dominate Demographic Factors in Chest X-ray AI Performance Disparities: A Multi-Dataset External Validation Study

Farquhar, H. L.

2026-01-22 radiology and imaging

10.64898/2026.01.20.26344495 medRxiv

Show abstract

Artificial intelligence systems for chest radiograph interpretation are increasingly deployed in clinical practice, yet current fairness frameworks emphasize demographic subgroup analysis while the relative contribution of technical acquisition parameters to performance disparities remains poorly characterized. We conducted a multi-dataset validation study analyzing 138,804 chest radiographs from the RSNA Pneumonia Detection Challenge (n=26,684; 22.5% pneumonia prevalence) and NIH ChestX-ray14 (n=112,120; 1.3% prevalence) using five pre-trained DenseNet-121 models. We calculated sensitivity, specificity, and area under the receiver operating characteristic curve stratified by view type (anteroposterior versus posteroanterior), age group, and sex, with performance disparity analysis quantifying each factors contribution to performance variation. View type dominated total observed performance range in both datasets: 87% in RSNA and 69% in NIH. All five models demonstrated systematic posteroanterior view underdiagnosis with miss rates of 30-78%. The odds ratio for missed diagnosis on posteroanterior versus anteroposterior views was 6.69 (95% CI: 5.79-7.72) in RSNA and 13.02 (95% CI: 11.62-14.59) in NIH. Analysis of 131,361 disease-free images demonstrated that view-type effects persist strongly even without disease (Cohens d = 1.19-1.33), providing compelling evidence against the hypothesis that observed disparities reflect disease severity confounding rather than learned image characteristics. Age explained 5-30% of the total observed performance range depending on dataset, while sex consistently explained less than 2%. Technical acquisition parameters, specifically radiograph view type, dominate performance disparities in chest X-ray AI substantially exceeding demographic factor contributions. These findings have immediate implications for regulatory frameworks: future FDA and EU AI Act guidance should explicitly mandate acquisition parameter auditing alongside demographic subgroup analysis. Author SummaryArtificial intelligence systems that interpret chest X-rays are being used in hospitals worldwide. There has been important work examining whether these systems perform fairly across different patient groups--for example, whether they work equally well for men and women, or for patients of different ages and races. We asked a different question: does the way the X-ray was taken affect how well AI systems perform? We found that the technical method used to acquire the image--specifically, whether the X-ray beam was directed from back to front (posteroanterior view, typical in outpatient settings) or front to back (anteroposterior view, typical in emergency and inpatient settings)--explained 69-87% of the variation in AI performance. In contrast, age explained only 5-30% and sex less than 2%. Most concerning, AI systems missed 30-78% of pneumonia cases in standing patients across all five systems we tested. This matters because current regulations focus on checking AI performance across demographic groups but do not require checking performance across technical acquisition parameters. Our findings suggest regulators and hospitals should audit how AI systems perform on different types of X-ray images, not just different types of patients.

Technical Acquisition Parameters Dominate Demographic Factors in Chest X-ray AI Performance Disparities: A Multi-Dataset External Validation Study

Matching journals