Back

A generalized synthetic control algorithm for sparse functional data

Shao, L.; Pohl, K. M.; Thompson, W. K.

2026-02-25 neuroscience
10.64898/2026.02.23.707582 bioRxiv
Show abstract

The Synthetic Control Method (SCM) and its interactive factor model generalizations (GSC) are powerful for estimating causal effects from panel data but are not easily applied when follow-up is irregular or sparse, common features of biomedical cohorts. We develop a Bayesian functional extension of GSC that treats each units outcome path as a smooth latent trajectory and accommodates unequally spaced measurements. Trajectories are approximated using Functional Principal Components Analysis (FPCA), providing a data-driven basis that captures dominant patterns with minimal shape assumptions while borrowing strength across individuals. Within this representation, we learn unit and time latent factors jointly with FPCA scores from the control data, construct counterfactual trajectories for treated units, and quantify uncertainty via the posterior. Identification relies on a latent-factor/weak-trend condition and overlap of controls and treated units in the functional score space. Simulation studies varying donor pool and treated unit size and sampling density show that the proposed approach (a.k.a GSC-FPCA) yields low bias when sampling is irregular or sparse, with well-calibrated interval coverage across a broad range of scenarios. We apply the method to longitudinal neuroimaging data from the National Consortium on Alcohol and Neurodevelopment in Adolescence - Adulthood (NCANDA-A) study to estimate the effect of adolescent binge drinking on subsequent brain volumes. Leveraging from 1 to 9 observed time points per participant, GSC-FPCA produces stable counterfactuals and detects a negative impact on gray-matter volumes with sustained high levels of binge drinking. Our results demonstrate that embedding GSC within a functional framework enables robust causal inference in biomedical applications characterized by irregularly-spaced visits, limited observations, and complex outcome dynamics.

Matching journals

The top 8 journals account for 50% of the predicted probability mass.

1
NeuroImage
813 papers in training set
Top 0.8%
14.7%
2
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 9%
7.2%
3
Human Brain Mapping
295 papers in training set
Top 1%
6.4%
4
Nature Communications
4913 papers in training set
Top 28%
6.4%
5
Nature Computational Science
50 papers in training set
Top 0.1%
4.9%
6
Imaging Neuroscience
242 papers in training set
Top 0.7%
4.9%
7
Medical Image Analysis
33 papers in training set
Top 0.2%
4.9%
8
Nature Neuroscience
216 papers in training set
Top 2%
4.3%
50% of probability mass above
9
Nature Methods
336 papers in training set
Top 2%
4.3%
10
PLOS Computational Biology
1633 papers in training set
Top 9%
4.0%
11
Nature Biotechnology
147 papers in training set
Top 3%
2.6%
12
Communications Biology
886 papers in training set
Top 5%
2.1%
13
eLife
5422 papers in training set
Top 38%
1.9%
14
Neuron
282 papers in training set
Top 6%
1.7%
15
Science Advances
1098 papers in training set
Top 17%
1.7%
16
The American Journal of Human Genetics
206 papers in training set
Top 2%
1.7%
17
PLOS ONE
4510 papers in training set
Top 57%
1.5%
18
Cell Systems
167 papers in training set
Top 8%
1.3%
19
Network Neuroscience
116 papers in training set
Top 0.7%
1.3%
20
Biological Psychiatry
119 papers in training set
Top 2%
1.2%
21
Cell Reports
1338 papers in training set
Top 29%
1.0%
22
Nature Biomedical Engineering
42 papers in training set
Top 2%
0.9%
23
Nature Human Behaviour
85 papers in training set
Top 4%
0.8%
24
Scientific Reports
3102 papers in training set
Top 73%
0.8%
25
Nature
575 papers in training set
Top 15%
0.7%
26
Advanced Science
249 papers in training set
Top 19%
0.7%
27
eneuro
389 papers in training set
Top 9%
0.7%
28
Biometrics
22 papers in training set
Top 0.3%
0.6%
29
Communications Psychology
20 papers in training set
Top 0.4%
0.6%