Statistical Methods for Genetic Association Studies under the Extreme Phenotype Sampling Design: Modelling the Effects of both Common and Rare Genetic Variants
MetadataShow full item record
In this thesis we investigate a concept in genetic association studies known as extreme phenotype sampling (EPS), where phenotype refers to physical appearances and in humans. In EPS studies, only individuals with extreme phenotypes are genotyped. Extreme phenotypes are typically defined as both ends of the spectrum of a continuously measurable trait such as weight or Body Mass Index (BMI). We introduce and develop statistical methods that apply to this design.We investigate extreme phenotype sampling in both common and rare variant association analysis. For common variant association studies we will present methods that use the conditional model and the missing genotype model to test for genetic associations with disease. In this thesis we extend both methods to include any number of genetic and non-genetic covariates. We develop score test statistics for both these methods to test if there is an association between genetic variables and a phenotype. In order to evaluate these methods, we apply them to a dataset from the HUNT study (the Nord-Trøndelag health study) where we investigate the association between certain SNPs and waist-hip ratio.For rare variant association studies we present five relevant methods for the cross-sectional design; (1) the collapsing method, (2) the CMC method, (3) the SKAT method, (4) the SKAT-O method, and (5) the B-SO (beta-smooth only) method. We adapt the CMC method and the B-SO method, to the EPS design using the conditional model and corresponding score test. The collapsing method and the kernel based methods have already been adapted to the extreme phenotype design based on the conditional model. We compare all five methods in an extensive simulation study. We use the software COSI to simulate rare variant genotype data.In both common and rare variant studies we compare the cross-sectional design and the extreme phenotype sampling design. Extreme samples can theoretically be more powerful for detecting an association between a genetic variant and a phenotype, because the proportion of causal variants is enriched in extreme samples. This can be especially important for rare variant association studies. However, in this thesis we show that estimates made based on the conditional model are sensitive to violations of model assumptions in a greater degree than estimates based on a multiple linear regression model. Additionally, we show that if sample sizes are low or the proportion of causal variants included in the model is low, a random sampling method can be as powerful as an extreme phenotype sample to detect the associations.