Back

Highly adaptive LASSO: Machine learning that provides valid nonparametric inference in realistic models

Butzin-Dozier, Z.; Qiu, S.; Hubbard, A. E.; Shi, J.; van der Laan, M.

2024-10-19 epidemiology
10.1101/2024.10.18.24315778 medRxiv
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWUnderstanding treatment effects on health-related outcomes using real-world data requires defining a causal parameter and imposing relevant identification assumptions to translate it into a statistical estimand. Semiparametric methods, like the targeted maximum likelihood estimator (TMLE), have been developed to construct asymptotically linear estimators of these parameters. To further establish the asymptotic efficiency of these estimators, two conditions must be met: 1) the relevant components of the data likelihood must fall within a Donsker class, and 2) the estimates of nuisance parameters must converge to their true values at a rate faster than n-1/4. The Highly Adaptive LASSO (HAL) satisfies these criteria by acting as an empirical risk minimizer within a class of cadlag functions with a bounded sectional variation norm, which is known to be Donsker. HAL achieves the desired rate of convergence, thereby guaranteeing the estimators asymptotic efficiency. The function class over which HAL minimizes its risk is flexible enough to capture realistic functions while maintaining the conditions for establishing efficiency. Additionally, HAL enables robust inference for non-pathwise differentiable parameters, such as the conditional average treatment effect (CATE) and causal dose-response curve, which are important in precision health. While these parameters are often considered in machine learning literature, these applications typically lack proper statistical inference. HAL addresses this gap by providing reliable statistical uncertainty quantification that is essential for informed decision-making in health research.

Matching journals

The top 1 journal accounts for 50% of the predicted probability mass.

1
Statistics in Medicine
34 papers in training set
Top 0.1%
52.4%
50% of probability mass above
2
Epidemiology
26 papers in training set
Top 0.1%
6.9%
3
BMC Medical Research Methodology
43 papers in training set
Top 0.3%
3.6%
4
International Journal of Epidemiology
74 papers in training set
Top 0.6%
3.6%
5
American Journal of Epidemiology
57 papers in training set
Top 0.4%
3.3%
6
PLOS ONE
4510 papers in training set
Top 42%
3.1%
7
Scientific Reports
3102 papers in training set
Top 45%
2.6%
8
Genetic Epidemiology
46 papers in training set
Top 0.3%
2.6%
9
Nature Communications
4913 papers in training set
Top 45%
2.5%
10
Biometrics
22 papers in training set
Top 0.1%
1.9%
11
The Annals of Applied Statistics
15 papers in training set
Top 0.1%
1.7%
12
Journal of The Royal Society Interface
189 papers in training set
Top 3%
1.3%
13
PLOS Computational Biology
1633 papers in training set
Top 20%
1.2%
14
Medical Decision Making
10 papers in training set
Top 0.2%
1.1%
15
Journal of Biomedical Informatics
45 papers in training set
Top 1%
0.8%
16
Proceedings of the National Academy of Sciences
2130 papers in training set
Top 42%
0.8%
17
eLife
5422 papers in training set
Top 57%
0.8%
18
Trials
25 papers in training set
Top 2%
0.8%
19
Science Advances
1098 papers in training set
Top 29%
0.8%
20
Bioinformatics
1061 papers in training set
Top 10%
0.7%
21
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.6%