HUNTing for genes: Studies of low-frequency and rare variation in human height, myocardial infarction and serum lipid levels utilizing population-based biobanks
MetadataShow full item record
Background An important aspect of maintaining or improvinghealth is to understand the mechanisms of why we get ill in the first place. Understanding the genetic basis for diseases can identify targets for new, improved therapies and strategies for disease prevention. Genome-wide association studies have successfully identified association between thousands of Genetic regions and diseases, represented primarily by one or more common genetic variants (minor allele frequency >5%) with small effect sizes. However, these genome-wide association variants together explain a fraction of the trait variation. There have been speculated that this missing heritability could be due to contributions by low-frequency (minor allele frequency 1-5%) and rare variation (minor allele frequency <1%) that were not well tested by genome-wide association studies. In addition, a majority of the genetic variations associated with diseases are located outside genes. Thus, it has proved difficult to identify the functional genes for many of these associated regions. There have been speculated that assessing the coding variation within genes could guide this discovery. Aims The primary aim of this thesis was to improve our understanding of the contribution of low-frequency and coding variation With moderate to high effect on complex traits. More specifically to: assess whether common variants in individuals at the extreme tails of the height distribution (Paper 1); assess the contribution of low-frequency and rare coding variants with moderate to high effect sizes to the risk of myocardial infarction and blood lipid levels, and to assess if this variation can highlight candidate genes (Paper 2 and 3). Material and methods We utilized information from three population-based health studies: The Nord-Trøndelag Health Study (Paper 1-3), the Tromsø study (Paper 2-3) and FINRISK (Paper 1). We studied individuals from the tallest and shortest ~1.5% from the HUNT and FINRISK cohorts (N = 1,214) for Paper 1, investigating 160 SNPs previously robustly identified with height at the population level. We genotyped 2,906 individuals with hospital diagnosed myocardial infarction as cases and 6,738 non cardiovascular disease controls from the HUNT and Tromsø cohorts for Paper 2 and 3. We first investigated ~80,000 coding variants in 5,643 individuals from the HUNT cohort and followed up 18 variants in 4,666 individuals from the Tromsø cohort. Follow-up experiments were conducted in C57BL/6J strain mice. Results We found that common variants influence height at the extremes of the distribution. Our results did also indicate that the this model starts to break down in extreme short individuals (near the 0.25% percentile cut off). The existens of low-frequency and rare variation could explain this finding among the extreme short individuals, although many other explanations exist, such that these individuals could also be short for non-genetic reasons preventing these individuals from reaching their genetic height potential. . We did not identify any low-frequency or rare coding variants with high effect sizes for myocardial infarction, although we did identify two such variants for lipids. This finding suggested that these variants existed. Our inability to identify large number of novel low-frequency variants indicated that variants with very large effects would probably not account for a large proportion of the missing heritability. However, our identification of an excess of results at thresholds below study-wide significance suggested that low-frequency and rare disease-associated coding variants with smaller effect sizes might be found in (much) larger samples. We identified one variant (encoding p.Glu167Lys in TM6SF2) that was both associated With lower total cholesterol levels and decreased risk of myocardial infarction. After further analysis, this variant suggested a causal gene in a previously described genome-wide association study locus. Follow-up experiments in mice showed that overexpression of human TM6SF2 raised total cholesterol compared to a control construct and that knockdown of endogenous Tm6sf2 decreases total cholesterol, consistent with this gene being involved in regulating blood lipid levels. Conclusion This thesis contribute new insight into our understanding of the contribution of low-frequency and coding variation in human height, myocardial infarction and human lipid levels. Arguably more important, our work illuminates how high quality population-based cohorts and biobank collections can be used as a starting point for knowledge generation in genetic epidemiology.