Statistical Modelling and Inference for Long Gene Expression Time Series
Master thesis
Permanent lenke
http://hdl.handle.net/11250/247210Utgivelsesdato
2014Metadata
Vis full innførselSamlinger
- Institutt for fysikk [2646]
Sammendrag
The main objective of this thesis is to model and analyse long gene expression time series from a microarray study using the linear mixed effects model. This regression model is widely used in the fields of biology, ecology and medicine. The linear mixed effects model combines both fixed and random effects on a linear scale. We will use data from a microarray study conducted by Astrid Lægreid & Torunn Bruland and collaborators at Department of Cancer Research and Molecular Medicine (IKM) at Norwegian University of Science and Technology (NTNU) in 2009. The data set consists of paired time series, one gastrin stimulated treatment and one unstimulated control, for 8956 genes. The response value is a logarithmic measure of gene expression, and is measured for two biological replicates. The linear mixed effects model can be fitted for each of the genes in the data set. We have examined if the area under the estimated time series curve may be used as a measure of strength of the gene expression activation over time, and if this area can be used to rank the genes with respect to effect size over time. Significant activation can be assessed with the aid of hypothesis tests. With the area as a measure of strength of the gene expression activation over time, we have suggested a hypothesis test for assessing gene significance. Analyses will be performed based on parametric assumptions and on permutation. Test statistics related to the analyses are suggested. Our permutation strategy is validated through a small scale simulation study. Multiple testing of hypotheses are conducted. The parametric and permutation approach will be compared and evaluated using statistical inference.