Vis enkel innførsel

dc.contributor.advisorHalaas, Arnenb_NO
dc.contributor.advisorSætrom, Pålnb_NO
dc.contributor.authorHåndstad, Tonynb_NO
dc.date.accessioned2014-12-19T13:33:29Z
dc.date.available2014-12-19T13:33:29Z
dc.date.created2010-09-04nb_NO
dc.date.issued2006nb_NO
dc.identifier348403nb_NO
dc.identifierntnudaim:1423nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/251133
dc.description.abstractA central problem in computational biology is the classification of related proteins into functional and structural classes based on their amino acid sequences. Several methods exist to detect related sequences when the level of sequence similarity is high, but for very low levels of sequence similarity the problem remains an unsolved challenge. Most recent methods use a discriminative approach and train support vector machines to distinguish related sequences from unrelated sequences. One successful approach is to base a kernel function for a support vector machine on shared occurrences of discrete sequence motifs. Still, many protein sequences fail to be classified correctly for a lack of a suitable set of motifs for these sequences. We introduce a motif kernel based on discrete sequence motifs where the motifs are synthesised using genetic programming. The motifs are evolved to discriminate between different families of evolutionary origin. The motif matches in the sequence data sets are then used to compute kernels for support vector machine classifiers that are trained to discriminate between related and unrelated sequences. When tested on two updated benchmarks, the method yields significantly better results compared to several other proven methods of remote homology detection. The superiority of the kernel is especially visible on the problem of classifying sequences to the correct fold. A rich set of motifs made specifically for each SCOP superfamily makes it possible to classify more sequences correctly than with previous motif-based methods.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleProtein Remote Homology Detection using Motifs made with Genetic Programmingnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber114nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel