Back

Shrinkage Parameter Estimation in Penalized Logistic RegressionAnalysis of Case-Control Data

Yu, Y.; Chen, S.; McNeney, B.

2021-02-14 genetics
10.1101/2021.02.12.430986 bioRxiv
Show abstract

IntroductionIncreasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type 1 errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. MethodsWe propose a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; and second, the estimated m is used for log-F -penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm (MCEM) and (ii) a Laplace approximation (LA) to each integral, followed by derivative-free optimization of the approximation. ResultsWe evaluate the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F -penalized approach has lower bias and mean squared error than other methods considered. We also illustrate the approach on data from a study of genetic associations with "super senior" cases and middle aged controls. Discussion/ConclusionWe have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.

Matching journals

The top 3 journals account for 50% of the predicted probability mass.

1
Genetic Epidemiology
46 papers in training set
Top 0.1%
23.4%
2
BMC Bioinformatics
383 papers in training set
Top 0.1%
23.4%
3
Bioinformatics
1061 papers in training set
Top 3%
10.8%
50% of probability mass above
4
PLOS ONE
4510 papers in training set
Top 24%
7.1%
5
Frontiers in Genetics
197 papers in training set
Top 2%
3.8%
6
International Journal of Epidemiology
74 papers in training set
Top 0.7%
3.2%
7
Scientific Reports
3102 papers in training set
Top 54%
1.9%
8
Statistics in Medicine
34 papers in training set
Top 0.1%
1.9%
9
G3 Genes|Genomes|Genetics
351 papers in training set
Top 1%
1.8%
10
BMC Research Notes
29 papers in training set
Top 0.2%
1.4%
11
PLOS Genetics
756 papers in training set
Top 10%
1.4%
12
BMC Medical Research Methodology
43 papers in training set
Top 0.8%
1.3%
13
BMC Genomics
328 papers in training set
Top 4%
1.1%
14
Briefings in Bioinformatics
326 papers in training set
Top 5%
1.0%
15
Human Brain Mapping
295 papers in training set
Top 4%
0.9%
16
PLOS Computational Biology
1633 papers in training set
Top 24%
0.8%
17
PeerJ
261 papers in training set
Top 14%
0.8%
18
BioData Mining
15 papers in training set
Top 0.8%
0.8%
19
American Journal of Epidemiology
57 papers in training set
Top 1%
0.7%
20
European Journal of Human Genetics
49 papers in training set
Top 1%
0.7%
21
Mathematics
11 papers in training set
Top 0.5%
0.7%
22
Journal of the American Medical Informatics Association
61 papers in training set
Top 2%
0.7%
23
Human Molecular Genetics
130 papers in training set
Top 4%
0.5%
24
Gene
41 papers in training set
Top 3%
0.5%
25
Forensic Science International: Genetics
24 papers in training set
Top 0.2%
0.5%