Yuan Zhang
Detecting Rare Haplotype-environment Interaction Under Uncertainty of Gene-environment Independence Assumption with an Extension to Complex Sampling Data

Detecting Rare Haplotype-environment Interaction Under Uncertainty of Gene-environment Independence Assumption with an Extension to Complex Sampling Data

Yuan Zhang2016
Genome-wide association studies have identified thousands of common variants associated with common diseases; however, these variants explain only a small proportion of the disease heritability, raising the question of how to find "missing heritability.'' Two critical factors in the quest for missing heritability are believed to be rare variants and gene-environment interactions (GXE). Recently, a method called Logistic Bayesian Lasso (LBL) was proposed for detecting GXE where G is a rare haplotype variant (rHTV). It is a powerful method for detecting rHTVs and their interactions. However, it is computationally intensive and assumes G-E independence, which may not hold in some situations. At the same time, complex sampling designs such as stratified random sampling are becoming increasingly popular for case-control studies, for example, the US kidney cancer study (KCS). There is currently no rHTV association method that can accommodate such a complex sampling design. First, we propose an improved version of LBL, which is computationally faster and can accommodate multiple covariates. Simulation studies show that it is equivalent to the original version in terms of accuracy of estimates and inference. We apply this improved version to a lung cancer dataset and find an rHTV with protective effect for current smokers. Next, we propose an extension that allows for G-E dependence and show that it controls type I error rates in presence of G-E dependence unlike the earlier version. However, the extension has reduced power when G-E independence holds. Therefore, we unify the two models by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that the unified approach performs well under both G-E independence and dependence. We analyze a lung cancer dataset and find several significant interactions, including one between a specific rHTV and smoking. Finally, we adapt LBL to accommodate complex sampling. We show that it performs well when data are collected using stratified random sampling with matching between cases and controls while the original LBL method leads to inflated type I error rates. We then analyze the KCS data and find a significant interaction between current smoking and a specific rHTV in the N-acetyltransferase 2 gene.
Sign up to use