Potentials and limitations of motif-based binding site prediction in DNA

Sandve, Geir Kjetil Ferkingstad

dc.contributor.author	Sandve, Geir Kjetil Ferkingstad	nb_NO
dc.date.accessioned	2014-12-19T13:30:16Z
dc.date.available	2014-12-19T13:30:16Z
dc.date.created	2008-09-12	nb_NO
dc.date.issued	2008	nb_NO
dc.identifier	124606	nb_NO
dc.identifier.isbn	978-82-471-1169-7	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/249914
dc.description.abstract	As the full genomic DNA sequence is now available for several organisms, a major next challenge is determining the function of DNA elements. This task is often referred to as functional genomics. An important part of functional genomics is gene regulation, and particularly the binding of specific proteins called Transcription Factors (TFs) to DNA. This TF binding regulates the production of mRNA, and thereby eventually proteins, from genes. As experimental determination of TF binding sites in DNA is a very laborious process, there is great interest in computational prediction methods. The basic idea behind computational binding site prediction is to use motifs (sequence patterns) to capture sequence similarity between separate binding sites for a given TF. Based on a set of known binding site examples, the sequence similarity can be exploited for prediction of additional binding sites for a given TF. As motifs representing TF binding sites should occur more frequently than expected by chance alone in co-regulated DNA sequences, computational methods can even be used to discover novel TF binding site motifs and associated binding sites using only un-annotated target DNA sequences as input. The focus of this thesis is on the computational prediction of TF binding sites, and specifically on understanding the current limitations and potential for improvement of binding site prediction. Two of the papers in the thesis relate to the assessment of computational predictions. The data sets used in a recent benchmark of prediction methods is analyzed in relation to three commonly used motif models, showing some fundamental performance limitations that should be attributed either to the motif models or to the benchmark data sets themselves. A first broad benchmark of methods predicting higher-order organization of TF binding sites is also part of this thesis. The benchmark showed some differences in prediction accuracy between methods, and more generally that a moderate level of prediction accuracy can be expected in the considered scenario. Two novel motif discovery methods are also presented in the thesis. Both of the methods consider the problem of predicting higher-order organization of binding sites, given motifs representing binding of individual TFs as input. One method takes a Bayesian probabilistic approach to binding site modeling, while the other method uses a discrete approach. Both methods use highly expressive models and show good quantitative performance in relation to existing methods. Each method also introduces some additional elements that may bring qualitative advantages. A third and final direction of research in this thesis concerns the extended process of motif discovery in DNA. Topics considered include how data is compiled before binding site prediction is performed, how prediction results can be interpreted in a multiple-testing scenario, and how prediction can be accelerated by the use of parallel hardware.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.relation.ispartofseries	Doktoravhandlinger ved NTNU, 1503-8181; 2008:239	nb_NO
dc.relation.haspart	Sandve, Geir Kjetil; Drabløs, F. Drabløs. A survey of motif discovery methods in an integrated framework. Biology Direct. 1(11), 2006.	nb_NO
dc.relation.haspart	Lin, Tien-ho; Ray, Pradipta; Sandve, Geir Kjetil; Uguroglu, Selen; Xing, Eric P. BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes. Proceedings of the Twelfth Annual International Conference on Research in Computational Molecular Biology (RECOMB).: 66-81, 2008.	nb_NO
dc.relation.haspart	Sandve, Geir Kjetil; Abdul, O; Walseng, V; Drabløs, F. Improved benchmarks for computational motif discovery. BMC Bioinformatics. 8(193), 2007.	nb_NO
dc.relation.haspart	Abdul, O; Drabløs, F; Sandve, Geir Kjetil. A Methodology for Motif Discovery Employing Iterated Cluster Re-assignment. Series on Advances in Bioinformatics and Computational Biology.. 4: 257-268, 2006.	nb_NO
dc.relation.haspart	Abdul, O; Sandve, Geir Kjetil; Drabløs, F. False discovery rates in identifying functional DNA motifs. Proceedings of the 7th IEEE International Conference on Bioinformatics and Bioengineering (BIBE): 387-394, 2007.	nb_NO
dc.relation.haspart	Sandve, Geir Kjetil; Nedland, Magnar; Syrstad, Øyvind Bø; Eidsheim, Lars Andreas; Abdul, O; Drabløs, F. Accelerating Motif Discovery: Motif Matching on Parallel Hardware. Lecture Notes in Computer Science (The original publication is available at www.springerlink.com). 4175: 197-206, 2006.	nb_NO
dc.title	Potentials and limitations of motif-based binding site prediction in DNA	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.description.degree	PhD i informasjons- og kommunikasjonsteknologi	nb_NO
dc.description.degree	PhD in Information and Communications Technology	en_GB

Tilhørende fil(er)

Filnavn:: 124606_FULLTEXT02.pdf
Størrelse:: 3.465Mb
Format:: PDF

Åpne

Filnavn:: 124606_FULLTEXT01.pdf
Størrelse:: 4.214Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6768]

Vis enkel innførsel