Back

Supervised Domain Adaptation Mitigates Cross-Ethnicity Prediction Error in Neuroimaging Based Cognitive Prediction

Lal Khakpoor, F.; van der Vliet, W.; Deng, J.; Wang, Y.; Pat, N.

2026-05-28 neuroscience
10.64898/2026.05.25.727742 bioRxiv
Show abstract

Machine-learning models are increasingly used to predict cognitive and clinical outcomes from neuroimaging data, yet challenges in fairness and generalizability remain. Large-scale datasets are often racially and ethnically imbalanced, leading to systematic performance disparities, with models typically achieving higher accuracy for majority populations represented in the training data. In this study, we evaluated whether supervised domain adaptation methods--including balanced weighting, two-stage TrAdaBoost, feature augmentation with SrcOnly prediction, and linear interpolation--can mitigate these biases. Using the ABCD dataset, we assessed whether models trained on 80 MRI measures from White American participants could generalize more effectively to African American participants. All domain adaptation methods reduced prediction error for African American participants, particularly for MRI modalities with large baseline disparities (e.g., structural MRI), while offering limited improvements where initial gaps were smaller (e.g., functional connectivity). Among the approaches, balanced weighting performed best and remained stable and beneficial even when only 10 African American participants were used to adapt the original model trained exclusively on White American participants. These findings suggest that simple, low-cost strategies can effectively reduce cross-ethnic performance gaps and improve equity in predictive neuroimaging, offering a practical path forward for future neuroimaging predictive biomarkers. Significant StatementLarge-scale neuroimaging datasets increasingly enable machine-learning models to predict cognitive and clinical outcomes; however, these datasets are often ethnically/racially imbalanced. As a result, predictive models tend to generalize poorly to underrepresented populations. We demonstrate that, across 80 MRI phenotypes, a class of machine-learning approaches collectively known as supervised domain adaptation can substantially reduce cross-ethnicity disparities in neuroimaging-based cognitive prediction, even when only limited data from underrepresented groups are available. Among the methods evaluated, balanced weighting achieved the best performance while maintaining low computational cost. Together, these findings provide a practical and scalable framework for improving fairness and generalizability in neuroimaging-based machine learning under realistic conditions of ethnic/racial imbalance.

Matching journals

The top 2 journals account for 50% of the predicted probability mass.

1
NeuroImage
813 papers in training set
Top 0.2%
33.3%
2
Human Brain Mapping
295 papers in training set
Top 0.1%
22.8%
50% of probability mass above
3
Medical Image Analysis
33 papers in training set
Top 0.2%
6.4%
4
Imaging Neuroscience
242 papers in training set
Top 0.9%
4.0%
5
Aperture Neuro
18 papers in training set
Top 0.1%
3.7%
6
Scientific Reports
3102 papers in training set
Top 46%
2.5%
7
PLOS Computational Biology
1633 papers in training set
Top 13%
2.4%
8
eLife
5422 papers in training set
Top 38%
1.9%
9
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 35%
1.5%
10
Cerebral Cortex
357 papers in training set
Top 1%
1.5%
11
PLOS ONE
4510 papers in training set
Top 57%
1.3%
12
NeuroImage: Clinical
132 papers in training set
Top 3%
1.3%
13
Developmental Cognitive Neuroscience
81 papers in training set
Top 0.4%
1.3%
14
Nature Communications
4913 papers in training set
Top 60%
0.9%
15
Magnetic Resonance in Medicine
72 papers in training set
Top 0.5%
0.8%
16
Scientific Data
174 papers in training set
Top 2%
0.8%
17
eneuro
389 papers in training set
Top 9%
0.7%
18
Frontiers in Neuroimaging
11 papers in training set
Top 0.4%
0.7%
19
Cell Reports
1338 papers in training set
Top 34%
0.7%
20
Nature Neuroscience
216 papers in training set
Top 6%
0.7%
21
IEEE Transactions on Medical Imaging
18 papers in training set
Top 0.6%
0.7%
22
Biological Psychiatry
119 papers in training set
Top 3%
0.7%
23
Communications Biology
886 papers in training set
Top 32%
0.5%
24
Network Neuroscience
116 papers in training set
Top 2%
0.5%