Computationally efficient familywise error rate control in genome‐wide association studies using score tests for generalized linear models
Peer reviewed, Journal article
MetadataVis full innførsel
OriginalversjonScandinavian Journal of Statistics. 2020, 47 (4), . 10.1111/sjos.12451
In genetic association studies, detecting phenotype–genotype association is a primary goal. We assume that the relationship between the data—phenotype, genetic markers and environmental covariates—can be modeled by a generalized linear model. The number of markers is allowed to be far greater than the number of individuals of the study. A multivariate score statistic is used to test each marker for association with a phenotype. We assume that the test statistics asymptotically follow a multivariate normal distribution under the complete null hypothesis of no phenotype–genotype association. We present the familywise error rate order k approximation method to find a local significance level (alternatively, an adjusted p‐value) for each test such that the familywise error rate is controlled. The special case k=1 gives the Šidák method. As a by‐product, an effective number of independent tests can be defined. Furthermore, if environmental covariates and genetic markers are uncorrelated, or no environmental covariates are present, we show that covariances between score statistics depend on genetic markers alone. This not only leads to more efficient calculations but also to a local significance level that is determined only by the collection of markers used, independent of the phenotypes and environmental covariates of the experiment at hand.