Back

Toward trustworthy clinical AI for obsessive-compulsive disorder: reliability, generalizability, and interpretability of a transformer model across the ENIGMA-OCD consortium

Pak, M.; Ryu, Y.; Bae, S.; Anticevic, A.; Costa, A. D.; Thorsen, A. L.; van der Straten, A. L.; Couto, B.; Vai, B.; Hansen, B.; Soriano-Mas, C.; Li, C.-s. R.; Vriend, C.; Lochner, C.; Pittenger, C.; Moreau, C. A.; Rodriguez-Manrique, D.; Vecchio, D.; Shimizu, E.; Stern, E. R.; Munoz-Moreno, E.; Nurmi, E. L.; Piras, F.; Colombo, F.; Piras, F.; Jaspers-Fayer, F.; Benedetti, F.; Venkatasubramanian, G.; Eng, G. K.; Simpson, H. B.; Ruan, H.; Hu, H.; van Marle, H. J. F.; Tomiyama, H.; Martinez-Zalacain, I.; Feusner, J.; Narayanaswamy, J. C.; Yun, J.-Y.; Sato, J. R.; Ipser, J.; Pariente, J. C.; Mench

2026-04-27 psychiatry and clinical psychology
10.64898/2026.04.24.26351711 medRxiv
Show abstract

Background. Studies applying machine learning to obsessive-compulsive disorder (OCD) typically report accuracy in homogeneous samples but rarely assess model reliability, generalizability, and interpretability needed for clinical use. Methods. We applied a transformer-based deep learning model, the Multi-Band Brain Net, to the ENIGMA-OCD cohort - the largest available resting-state functional magnetic resonance imaging (rs-fMRI) dataset in OCD with 1,706 participants (869 cases with OCD, 837 controls) across 23 sites worldwide. We evaluated model reliability by calculating calibration - the model's ability to "know what it doesn't know". We assessed generalizability using leave-one-site-out validation to test performance on unseen sites with different scanners, acquisition protocols, and patient populations. Finally, we examined interpretability by analyzing model attention weights to identify the neural connectivity patterns that influence model predictions. Results. The model achieved modest but competitive classification performance (AUROC = .653, SD = .039). Crucially, while large-scale pretraining on the UK Biobank (N = 40,783) did not boost accuracy, it significantly enhanced model calibration by reducing overconfident predictions. Leave-one-site-out validation showed a generalization gap across sites (AUROC = .427-.819). Pretraining did not close this gap but removed scanner manufacturer bias. Finally, attention-based mapping identified biologically plausible patterns of widespread hypoconnectivity in OCD relative to healthy controls, particularly in low-frequency bands involving the default mode, salience, and somatomotor networks. These findings aligned with known OCD neurobiology. Conclusions. This study provides a framework for developing more reliable and trustworthy clinical artificial intelligence for OCD.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
NeuroImage: Clinical
132 papers in training set
Top 0.1%
32.7%
2
Translational Psychiatry
219 papers in training set
Top 0.3%
14.2%
3
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 0.2%
6.7%
50% of probability mass above
4
Frontiers in Psychiatry
83 papers in training set
Top 1%
3.6%
5
Scientific Reports
3102 papers in training set
Top 47%
2.4%
6
Journal of Affective Disorders
81 papers in training set
Top 0.8%
2.1%
7
Biological Psychiatry
119 papers in training set
Top 1%
1.9%
8
Human Brain Mapping
295 papers in training set
Top 3%
1.7%
9
NeuroImage
813 papers in training set
Top 4%
1.7%
10
Psychological Medicine
74 papers in training set
Top 1.0%
1.7%
11
PLOS ONE
4510 papers in training set
Top 55%
1.6%
12
Nature Mental Health
18 papers in training set
Top 0.1%
1.5%
13
PLOS Computational Biology
1633 papers in training set
Top 18%
1.5%
14
JAMA Network Open
127 papers in training set
Top 3%
1.3%
15
Communications Biology
886 papers in training set
Top 13%
1.3%
16
Computational Psychiatry
12 papers in training set
Top 0.1%
1.2%
17
Brain Communications
147 papers in training set
Top 2%
1.2%
18
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.2%
1.2%
19
Brain Stimulation
112 papers in training set
Top 1%
1.1%
20
Psychiatry and Clinical Neurosciences
11 papers in training set
Top 0.2%
1.1%
21
European Psychiatry
10 papers in training set
Top 0.6%
0.9%
22
Biological Psychiatry Global Open Science
54 papers in training set
Top 1%
0.9%
23
Journal of Neurology, Neurosurgery & Psychiatry
29 papers in training set
Top 1%
0.8%
24
Cortex
102 papers in training set
Top 0.6%
0.7%
25
Imaging Neuroscience
242 papers in training set
Top 4%
0.6%