Shrinkage Parameter Estimation in Penalized Logistic RegressionAnalysis of Case-Control Data

Yu, Y.; Chen, S.; McNeney, B.

2021-02-14 genetics

10.1101/2021.02.12.430986 bioRxiv

Show abstract

IntroductionIncreasingly, logistic regression methods for genetic association studies of binary phenotypes must be able to accommodate data sparsity, which arises from unbalanced case-control ratios and/or rare genetic variants. Sparseness leads to maximum likelihood estimators (MLEs) of log-OR parameters that are biased away from their null value of zero and tests with inflated type 1 errors. Different penalized-likelihood methods have been developed to mitigate sparse-data bias. We study penalized logistic regression using a class of log-F priors indexed by a shrinkage parameter m to shrink the biased MLE towards zero. MethodsWe propose a two-step approach to the analysis of a genetic association study: first, a set of variants that show evidence of association with the trait is used to estimate m; and second, the estimated m is used for log-F -penalized logistic regression analyses of all variants using data augmentation with standard software. Our estimate of m is the maximizer of a marginal likelihood obtained by integrating the latent log-ORs out of the joint distribution of the parameters and observed data. We consider two approximate approaches to maximizing the marginal likelihood: (i) a Monte Carlo EM algorithm (MCEM) and (ii) a Laplace approximation (LA) to each integral, followed by derivative-free optimization of the approximation. ResultsWe evaluate the statistical properties of our proposed two-step method and compared its performance to other shrinkage methods by a simulation study. Our simulation studies suggest that the proposed log-F -penalized approach has lower bias and mean squared error than other methods considered. We also illustrate the approach on data from a study of genetic associations with "super senior" cases and middle aged controls. Discussion/ConclusionWe have proposed a method for single rare variant analysis with binary phenotypes by logistic regression penalized by log-F priors. Our method has the advantage of being easily extended to correct for confounding due to population structure and genetic relatedness through a data augmentation approach.

Shrinkage Parameter Estimation in Penalized Logistic RegressionAnalysis of Case-Control Data

Matching journals