Vis enkel innførsel

dc.contributor.advisorHetland, Magnus Lienb_NO
dc.contributor.advisorSandve, Geir Kjetilnb_NO
dc.contributor.advisorDrabløs, Finnnb_NO
dc.contributor.authorValebjørg, Vetle Søraasnb_NO
dc.date.accessioned2014-12-19T13:34:31Z
dc.date.available2014-12-19T13:34:31Z
dc.date.created2010-09-05nb_NO
dc.date.issued2006nb_NO
dc.identifier349063nb_NO
dc.identifierntnudaim:1191nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/251558
dc.description.abstractMapping the regulatory system in living organisms is a great challenge, and many methods have been created during the last 15 years to solve this problem. The biological processes are however more flexible and complex than first thought, and many of the methods lack the ability to imitate this exactly. The new method devised here is not a complete solution to this situation, but pose an innovative solution for finding approximate composite patterns in a set of sequences. Motifs are read from any third-party tool represented as either {A,C,G,T}, IUPAC or PWMs, and weighted with significance and support as an estimate to how important the patterns are. Finding combinations with both high significance and support can reveal important properties preserved in the sequences. Based on this, the algorithm use a branch-and-bound approach to traverse every combination while preserving the best solutions in this multiple object optimization problem in a Pareto front. The best patterns found, are investigated further by applying different statistical and experimental method to better support the significance of the patterns found. The three most important tests done on the TransCompel dataset, where (i) to look at the patterns predicted measured against known sites based on nucleotide correlation. (ii) Find the frequency for motifs participating in the combinations, so that the best could be studied manually. And (iii), different test where compared when the significance was based on real background sequences instead of the uniform distribution. Some of the results found where low, but still similar to the accuracy provided by other known methods that have been tested with the same methods. The test results can be biased by the parameters used, a too simple and restrictive test set or by faulty predictions done one the dataset tested. More testing and tuning of parameters might result in better predictions. However, the different tests still proved this method to be a valuable tool in composite motif discovery.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleDiscovery of approximate composite motifs in biological sequencesnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber112nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel