Back

Data Diversity vs. Model Complexity in the Prediction of Pediatric Bipolar Disorder: Evidence from Academic and Community Clinical Samples

Shi, Z.; Youngstrom, E. A.; Liu, Y.; Youngstrom, J. K.; Findling, R. L.

2026-03-27 psychiatry and clinical psychology
10.64898/2026.03.26.26349447 medRxiv
Show abstract

Pediatric bipolar disorder is challenging to diagnose accurately due to symptom heterogeneity. More standardized and data-driven approaches are needed to enhance diagnostic reliability. We evaluated a clinical decision tool (nomogram), statistical methods (logistic regression, LASSO), machine learning (support vector machine, random forest, k-nearest neighbors, extreme gradient boosting), and deep learning model (multilayer perceptron) for pediatric bipolar disorder prediction across two datasets collected in academic (N=550) and community (N=511) clinical settings. We compared three modeling strategies: cross-dataset validation, cross-dataset with interaction terms, and mixed-dataset. We assessed model performance using discrimination ability, calibration, and predictor importance ranking. In the baseline cross-dataset approach, all models showed good internal discrimination in the academic dataset; but external discrimination in the community dataset substantially declined. Interaction-enhanced models slightly improved internal discrimination but not external performance or calibration. Recalibration prominently improved cross-dataset calibration without compromising discrimination, indicating that transportability problems were largely driven by probability scaling. Models trained on mixed datasets exhibited much stronger external discrimination and calibration. Across models and training strategies, family risk and PGBI-10M were consistently ranked as the most important predictors. Predictive models for pediatric bipolar disorder showed strong internal performance but limited cross-setting generalizability due to dataset shift and miscalibration. Increasing model complexity did not improve external performance, whereas training on pooled data substantially improved both discrimination and calibration. Findings suggest that sampling diversity, rather than model complexity, is more valuable for developing clinically useful and generalizable psychiatric prediction models, underscoring the importance of open and collaborative datasets.

Matching journals

The top 7 journals account for 50% of the predicted probability mass.

1
Journal of Affective Disorders
81 papers in training set
Top 0.1%
14.4%
2
Translational Psychiatry
219 papers in training set
Top 0.3%
14.1%
3
Frontiers in Psychiatry
83 papers in training set
Top 0.6%
6.2%
4
Biological Psychiatry Global Open Science
54 papers in training set
Top 0.1%
4.8%
5
Acta Psychiatrica Scandinavica
10 papers in training set
Top 0.1%
4.8%
6
Psychiatry Research
35 papers in training set
Top 0.4%
3.9%
7
Acta Neuropsychiatrica
12 papers in training set
Top 0.2%
3.5%
50% of probability mass above
8
American Journal of Medical Genetics Part B: Neuropsychiatric Genetics
22 papers in training set
Top 0.1%
3.5%
9
Schizophrenia Bulletin
29 papers in training set
Top 0.3%
3.5%
10
Computational Psychiatry
12 papers in training set
Top 0.1%
2.7%
11
Psychological Medicine
74 papers in training set
Top 0.8%
2.3%
12
European Psychiatry
10 papers in training set
Top 0.3%
2.0%
13
PLOS ONE
4510 papers in training set
Top 51%
1.9%
14
Biological Psychiatry
119 papers in training set
Top 2%
1.8%
15
American Journal of Psychiatry
20 papers in training set
Top 0.2%
1.7%
16
JAMA Psychiatry
13 papers in training set
Top 0.3%
1.7%
17
Biological Psychiatry: Cognitive Neuroscience and Neuroimaging
62 papers in training set
Top 0.9%
1.6%
18
The British Journal of Psychiatry
21 papers in training set
Top 0.6%
1.5%
19
Journal of Psychiatric Research
28 papers in training set
Top 0.5%
1.5%
20
Drug and Alcohol Dependence
37 papers in training set
Top 0.5%
1.2%
21
Scientific Reports
3102 papers in training set
Top 67%
1.2%
22
IEEE Journal of Biomedical and Health Informatics
34 papers in training set
Top 2%
0.9%
23
Molecular Psychiatry
242 papers in training set
Top 3%
0.9%
24
NeuroImage: Clinical
132 papers in training set
Top 3%
0.8%
25
BJPsych Open
25 papers in training set
Top 0.7%
0.8%
26
BMC Psychiatry
22 papers in training set
Top 0.7%
0.8%
27
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 45%
0.7%
28
Journal of Medical Internet Research
85 papers in training set
Top 5%
0.7%
29
Frontiers in Public Health
140 papers in training set
Top 9%
0.7%
30
BMJ Mental Health
15 papers in training set
Top 0.4%
0.7%