Vis enkel innførsel

dc.contributor.authorSandve, Geir Kjetil Ferkingstadnb_NO
dc.date.accessioned2014-12-19T13:30:16Z
dc.date.available2014-12-19T13:30:16Z
dc.date.created2008-09-12nb_NO
dc.date.issued2008nb_NO
dc.identifier124606nb_NO
dc.identifier.isbn978-82-471-1169-7nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/249914
dc.description.abstractAs the full genomic DNA sequence is now available for several organisms, a major next challenge is determining the function of DNA elements. This task is often referred to as functional genomics. An important part of functional genomics is gene regulation, and particularly the binding of specific proteins called Transcription Factors (TFs) to DNA. This TF binding regulates the production of mRNA, and thereby eventually proteins, from genes. As experimental determination of TF binding sites in DNA is a very laborious process, there is great interest in computational prediction methods. The basic idea behind computational binding site prediction is to use motifs (sequence patterns) to capture sequence similarity between separate binding sites for a given TF. Based on a set of known binding site examples, the sequence similarity can be exploited for prediction of additional binding sites for a given TF. As motifs representing TF binding sites should occur more frequently than expected by chance alone in co-regulated DNA sequences, computational methods can even be used to discover novel TF binding site motifs and associated binding sites using only un-annotated target DNA sequences as input. The focus of this thesis is on the computational prediction of TF binding sites, and specifically on understanding the current limitations and potential for improvement of binding site prediction. Two of the papers in the thesis relate to the assessment of computational predictions. The data sets used in a recent benchmark of prediction methods is analyzed in relation to three commonly used motif models, showing some fundamental performance limitations that should be attributed either to the motif models or to the benchmark data sets themselves. A first broad benchmark of methods predicting higher-order organization of TF binding sites is also part of this thesis. The benchmark showed some differences in prediction accuracy between methods, and more generally that a moderate level of prediction accuracy can be expected in the considered scenario. Two novel motif discovery methods are also presented in the thesis. Both of the methods consider the problem of predicting higher-order organization of binding sites, given motifs representing binding of individual TFs as input. One method takes a Bayesian probabilistic approach to binding site modeling, while the other method uses a discrete approach. Both methods use highly expressive models and show good quantitative performance in relation to existing methods. Each method also introduces some additional elements that may bring qualitative advantages. A third and final direction of research in this thesis concerns the extended process of motif discovery in DNA. Topics considered include how data is compiled before binding site prediction is performed, how prediction results can be interpreted in a multiple-testing scenario, and how prediction can be accelerated by the use of parallel hardware.nb_NO
dc.languageengnb_NO
dc.publisherFakultet for informasjonsteknologi, matematikk og elektroteknikknb_NO
dc.relation.ispartofseriesDoktoravhandlinger ved NTNU, 1503-8181; 2008:239nb_NO
dc.relation.haspartSandve, Geir Kjetil; Drabløs, F. Drabløs. A survey of motif discovery methods in an integrated framework. Biology Direct. 1(11), 2006.nb_NO
dc.relation.haspartLin, Tien-ho; Ray, Pradipta; Sandve, Geir Kjetil; Uguroglu, Selen; Xing, Eric P. BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes. Proceedings of the Twelfth Annual International Conference on Research in Computational Molecular Biology (RECOMB).: 66-81, 2008.nb_NO
dc.relation.haspartSandve, Geir Kjetil; Abdul, O; Walseng, V; Drabløs, F. Improved benchmarks for computational motif discovery. BMC Bioinformatics. 8(193), 2007.nb_NO
dc.relation.haspartAbdul, O; Drabløs, F; Sandve, Geir Kjetil. A Methodology for Motif Discovery Employing Iterated Cluster Re-assignment. Series on Advances in Bioinformatics and Computational Biology.. 4: 257-268, 2006.nb_NO
dc.relation.haspartAbdul, O; Sandve, Geir Kjetil; Drabløs, F. False discovery rates in identifying functional DNA motifs. Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE): 387-394, 2007.nb_NO
dc.relation.haspartSandve, Geir Kjetil; Nedland, Magnar; Syrstad, Øyvind Bø; Eidsheim, Lars Andreas; Abdul, O; Drabløs, F. Accelerating Motif Discovery: Motif Matching on Parallel Hardware. Lecture Notes in Computer Science (The original publication is available at www.springerlink.com). 4175: 197-206, 2006.nb_NO
dc.titlePotentials and limitations of motif-based binding site prediction in DNAnb_NO
dc.typeDoctoral thesisnb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO
dc.description.degreePhD i informasjons- og kommunikasjonsteknologinb_NO
dc.description.degreePhD in Information and Communications Technologyen_GB


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel