Data Diversity vs. Model Complexity in the Prediction of Pediatric Bipolar Disorder: Evidence from Academic and Community Clinical Samples
Shi, Z.; Youngstrom, E. A.; Liu, Y.; Youngstrom, J. K.; Findling, R. L.
Show abstract
Pediatric bipolar disorder is challenging to diagnose accurately due to symptom heterogeneity. More standardized and data-driven approaches are needed to enhance diagnostic reliability. We evaluated a clinical decision tool (nomogram), statistical methods (logistic regression, LASSO), machine learning (support vector machine, random forest, k-nearest neighbors, extreme gradient boosting), and deep learning model (multilayer perceptron) for pediatric bipolar disorder prediction across two datasets collected in academic (N=550) and community (N=511) clinical settings. We compared three modeling strategies: cross-dataset validation, cross-dataset with interaction terms, and mixed-dataset. We assessed model performance using discrimination ability, calibration, and predictor importance ranking. In the baseline cross-dataset approach, all models showed good internal discrimination in the academic dataset; but external discrimination in the community dataset substantially declined. Interaction-enhanced models slightly improved internal discrimination but not external performance or calibration. Recalibration prominently improved cross-dataset calibration without compromising discrimination, indicating that transportability problems were largely driven by probability scaling. Models trained on mixed datasets exhibited much stronger external discrimination and calibration. Across models and training strategies, family risk and PGBI-10M were consistently ranked as the most important predictors. Predictive models for pediatric bipolar disorder showed strong internal performance but limited cross-setting generalizability due to dataset shift and miscalibration. Increasing model complexity did not improve external performance, whereas training on pooled data substantially improved both discrimination and calibration. Findings suggest that sampling diversity, rather than model complexity, is more valuable for developing clinically useful and generalizable psychiatric prediction models, underscoring the importance of open and collaborative datasets.
Matching journals
The top 7 journals account for 50% of the predicted probability mass.