A generalized synthetic control algorithm for sparse functional data

Shao, L.; Pohl, K. M.; Thompson, W. K.

2026-02-25 neuroscience

10.64898/2026.02.23.707582 bioRxiv

Show abstract

The Synthetic Control Method (SCM) and its interactive factor model generalizations (GSC) are powerful for estimating causal effects from panel data but are not easily applied when follow-up is irregular or sparse, common features of biomedical cohorts. We develop a Bayesian functional extension of GSC that treats each units outcome path as a smooth latent trajectory and accommodates unequally spaced measurements. Trajectories are approximated using Functional Principal Components Analysis (FPCA), providing a data-driven basis that captures dominant patterns with minimal shape assumptions while borrowing strength across individuals. Within this representation, we learn unit and time latent factors jointly with FPCA scores from the control data, construct counterfactual trajectories for treated units, and quantify uncertainty via the posterior. Identification relies on a latent-factor/weak-trend condition and overlap of controls and treated units in the functional score space. Simulation studies varying donor pool and treated unit size and sampling density show that the proposed approach (a.k.a GSC-FPCA) yields low bias when sampling is irregular or sparse, with well-calibrated interval coverage across a broad range of scenarios. We apply the method to longitudinal neuroimaging data from the National Consortium on Alcohol and Neurodevelopment in Adolescence - Adulthood (NCANDA-A) study to estimate the effect of adolescent binge drinking on subsequent brain volumes. Leveraging from 1 to 9 observed time points per participant, GSC-FPCA produces stable counterfactuals and detects a negative impact on gray-matter volumes with sustained high levels of binge drinking. Our results demonstrate that embedding GSC within a functional framework enables robust causal inference in biomedical applications characterized by irregularly-spaced visits, limited observations, and complex outcome dynamics.

A generalized synthetic control algorithm for sparse functional data

Matching journals