Back

Causal analyses using education-health linked data for England: a case study

De Stavola, B. L. L.; Aparicio Castro, a.; Nguyen, V. G.; Lewis, K. M.; Dearden, L.; Harron, K.; Zylbersztejn, A.; Shumway, J.; Gilbert, R.

2026-03-19 health policy
10.64898/2026.03.13.26348340 medRxiv
Show abstract

IntroductionThis article summarises lessons learnt from the Health Outcomes for young People throughout Education (HOPE) Study and serves as a real world, transferable application for addressing causal questions using administrative data. The HOPE study applied causal methods to analyses of administrative data in Education and Child Health Insights from Linked Data (ECHILD) aimed at studying the effectiveness of provision for special educational needs and disability (SEND) on health and education outcomes. MethodsDefining causal questions regarding the impact of SEND provision required judicious mapping of the question onto the data, leading to the selection of appropriate measures of effect, transparent handling of the data and control of confounding factors to estimate effects. We adopted the target trial emulation framework to guide these steps. Having encountered specific computational challenges in estimating the effects of interest, we simulated data that resembled the HOPE study and used them to practice the implementation of alternative estimation methods and to study impact of some of their assumptions. ResultsThe creation and analysis of the simulated data provided valuable insights. First, we learned the importance of aligning the target of estimation with the causal question at hand. Second, we observed how deviations from assumptions specific to each estimation method can affect results. Third, we highlighted the benefits of employing alternative estimation methods as sensitivity tools that can aid the interpretation of the resulting estimates. Finally, we offer user-friendly code in two programming languages (R and Stata) and accompanying simulated data to facilitate the implementation of these methods for similar causal questions. ConclusionWe recommend users of administrative data to fully specify -and possibly revise- the causal questions they wish to address and to carefully examine and compare assumptions, implementation and results obtained using alternative estimation methods.

Matching journals

The top 4 journals account for 50% of the predicted probability mass.

1
BMC Medical Research Methodology
43 papers in training set
Top 0.1%
18.2%
2
BMJ Open
554 papers in training set
Top 1.0%
17.1%
3
PLOS ONE
4510 papers in training set
Top 13%
14.4%
4
Social Science & Medicine
15 papers in training set
Top 0.1%
6.6%
50% of probability mass above
5
European Journal of Epidemiology
40 papers in training set
Top 0.1%
4.7%
6
Journal of Public Health
23 papers in training set
Top 0.1%
4.1%
7
BMC Public Health
147 papers in training set
Top 1%
3.9%
8
F1000Research
79 papers in training set
Top 0.7%
2.8%
9
BMC Medical Informatics and Decision Making
39 papers in training set
Top 1%
2.7%
10
Trials
25 papers in training set
Top 0.5%
2.5%
11
Journal of the American Medical Informatics Association
61 papers in training set
Top 1%
1.7%
12
BMC Medicine
163 papers in training set
Top 5%
1.3%
13
Scientific Reports
3102 papers in training set
Top 67%
1.2%
14
International Journal of Behavioral Nutrition and Physical Activity
15 papers in training set
Top 0.4%
1.2%
15
European Journal of Public Health
20 papers in training set
Top 0.7%
1.2%
16
Medical Decision Making
10 papers in training set
Top 0.2%
1.1%
17
Statistics in Medicine
34 papers in training set
Top 0.3%
0.9%
18
BMJ Open Quality
15 papers in training set
Top 0.8%
0.8%
19
BMJ Health & Care Informatics
13 papers in training set
Top 0.8%
0.8%
20
International Journal of Epidemiology
74 papers in training set
Top 3%
0.7%
21
BMJ Global Health
98 papers in training set
Top 3%
0.7%
22
International Journal of Public Health
17 papers in training set
Top 0.5%
0.6%
23
The Lancet Global Health
24 papers in training set
Top 1%
0.6%
24
BMC Health Services Research
42 papers in training set
Top 3%
0.6%