Hardware accelerated sequence analysis

Hestnes, Arne Johan

dc.contributor.advisor	Halaas, Arne	nb_NO
dc.contributor.advisor	Sætrom, Pål	nb_NO
dc.contributor.author	Hestnes, Arne Johan	nb_NO
dc.date.accessioned	2014-12-19T13:30:48Z
dc.date.available	2014-12-19T13:30:48Z
dc.date.created	2010-09-02	nb_NO
dc.date.issued	2005	nb_NO
dc.identifier	346734	nb_NO
dc.identifier	ntnudaim:1008	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250105
dc.description.abstract	The thesis evaluates two different techniques for using hardware acceleration in sequence analysis. The problem at hand is to detect remote homologies in protein sequences. This is useful for medical purposes, since protein function and structure can be predicted based on homology. We adapted an existing genetic programming with boosting algorithm to work with protein data, and tested it on a biological database. We also implemented a hardware accelerated kernel, for use with third party support vector machines. We tested on the same data as the boosted genetic programming solution, on generated DNA dataset, and compared the results to those of the boosted genetic programming solution and other algorithms. We found that genetic programming with boosting performs comparable to that of support vector machines with mismatch kernels. Further, we found that it is possible to hardware accelerate the mismatch kernel implementation, but that it is more effective on DNA sequence analysis than protein analysis. We found that the implementation of a motif kernel is the best approach for using hardware acceleration for protein homology detection. Genetic programming with boosting is a state of the art technique for detecting protein homology as it will give very good classification performance, but for our project, where we classify using support vector machines, as the SVM kernel can be computed once for the full dataset, it is more time consuming than using an analytical approach. Different string kernels for support vector machines can be implemented using hardware acceleration, but due to the number of permutations in large alphabets, it is less effective for mismatch kernels on protein data compared to other techniques. The kernel works fast when working on large datasets with short alphabets, like the DNA alphabet. The hardware approach is more adapted to motif searching than permutation searching.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	SIF2 datateknikk	no_NO
dc.subject	Program- og informasjonssystemer	no_NO
dc.title	Hardware accelerated sequence analysis	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	101	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 346734_FULLTEXT01.pdf
Størrelse:: 1.756Mb
Format:: PDF

Låst

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel