Vis enkel innførsel

dc.contributor.advisorHalaas, Arnenb_NO
dc.contributor.advisorSætrom, Pålnb_NO
dc.contributor.authorHestnes, Arne Johannb_NO
dc.date.accessioned2014-12-19T13:30:48Z
dc.date.available2014-12-19T13:30:48Z
dc.date.created2010-09-02nb_NO
dc.date.issued2005nb_NO
dc.identifier346734nb_NO
dc.identifierntnudaim:1008nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250105
dc.description.abstractThe thesis evaluates two different techniques for using hardware acceleration in sequence analysis. The problem at hand is to detect remote homologies in protein sequences. This is useful for medical purposes, since protein function and structure can be predicted based on homology. We adapted an existing genetic programming with boosting algorithm to work with protein data, and tested it on a biological database. We also implemented a hardware accelerated kernel, for use with third party support vector machines. We tested on the same data as the boosted genetic programming solution, on generated DNA dataset, and compared the results to those of the boosted genetic programming solution and other algorithms. We found that genetic programming with boosting performs comparable to that of support vector machines with mismatch kernels. Further, we found that it is possible to hardware accelerate the mismatch kernel implementation, but that it is more effective on DNA sequence analysis than protein analysis. We found that the implementation of a motif kernel is the best approach for using hardware acceleration for protein homology detection. Genetic programming with boosting is a state of the art technique for detecting protein homology as it will give very good classification performance, but for our project, where we classify using support vector machines, as the SVM kernel can be computed once for the full dataset, it is more time consuming than using an analytical approach. Different string kernels for support vector machines can be implemented using hardware acceleration, but due to the number of permutations in large alphabets, it is less effective for mismatch kernels on protein data compared to other techniques. The kernel works fast when working on large datasets with short alphabets, like the DNA alphabet. The hardware approach is more adapted to motif searching than permutation searching.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectProgram- og informasjonssystemerno_NO
dc.titleHardware accelerated sequence analysisnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber101nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel