Spatial Dependency in Methylation Data - A Bayesian Approach with R-INLA
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2390950Utgivelsesdato
2016Metadata
Vis full innførselSamlinger
Sammendrag
DNA methylation is a chemical process that regulates gene transcription and is known to interact with development and differentiation of the DNA. It affects almost exclusively CpG sites, and with the Illumina HumanMethylation$450$k BeadChip we are able to measure the methylation level for more than $450 000$ CpG sites in the human DNA. The locations of these CpG sites have been accurately measured to a base pair resolution, making it possible to look into spatial dependencies.
In this paper, we investigate differences in mean between two groups of people by taking the spatial dependency into account. The investigations and analysis is done on a data set containing methylation data from $62$ persons classified as having Schizophrenia and $33$ Healthy persons. An exploratory analysis have been done, to investigate which assumptions that should be made when analyzing methylation data. Through auto correlation analysis, correlation estimates and regression evaluations, we have seen that the data is influenced by spatial dependencies. With Bayesian regression with Integrated Nested Laplace Approximations(INLA), we have investigated different models to be able to quantify the spatial dependency structure, and in general the underlying structure of the methylation data at a part of chromosome $6$. The model that obtained the best fit included spatial dependency and an independently, identically distributed random effect in the linear predictor. The model was optimized using a likelihood that assumed a location independent precision parameter $\phi$.
Through simulations, we have seen that a test for differently methylated positions that builds on a model which utilizes the spatial dependency, might lead to better results than a T-test. Still, further studies are required. Some of the results obtained by the simulations deviates from those obtained by the case study, which might indicate the presence of an underlying structure in the methylation data that is not yet quantified.