Hardware accelerated sequence analysis
MetadataShow full item record
The thesis evaluates two different techniques for using hardware acceleration in sequence analysis. The problem at hand is to detect remote homologies in protein sequences. This is useful for medical purposes, since protein function and structure can be predicted based on homology. We adapted an existing genetic programming with boosting algorithm to work with protein data, and tested it on a biological database. We also implemented a hardware accelerated kernel, for use with third party support vector machines. We tested on the same data as the boosted genetic programming solution, on generated DNA dataset, and compared the results to those of the boosted genetic programming solution and other algorithms. We found that genetic programming with boosting performs comparable to that of support vector machines with mismatch kernels. Further, we found that it is possible to hardware accelerate the mismatch kernel implementation, but that it is more effective on DNA sequence analysis than protein analysis. We found that the implementation of a motif kernel is the best approach for using hardware acceleration for protein homology detection. Genetic programming with boosting is a state of the art technique for detecting protein homology as it will give very good classification performance, but for our project, where we classify using support vector machines, as the SVM kernel can be computed once for the full dataset, it is more time consuming than using an analytical approach. Different string kernels for support vector machines can be implemented using hardware acceleration, but due to the number of permutations in large alphabets, it is less effective for mismatch kernels on protein data compared to other techniques. The kernel works fast when working on large datasets with short alphabets, like the DNA alphabet. The hardware approach is more adapted to motif searching than permutation searching.