Hardware accelerated genetic programming for pattern mining in strings
MetadataShow full item record
This thesis considers the problem of mining patterns in strings. Informally, this is the problem of extracting information (patterns) that characterizes parts of, or even the complete, string. The thesis describes a high performance hardware for string searching, which together with genetic programming, forms the basis for the thesis’ pattern mining algorithms. This work considers two different pattern mining problems and develops several different algorithms to solve different variants of these problems. Common to all algorithms is that they use genetic programming to evolve patterns that can be evaluated by the special purpose search hardware. The first pattern mining problem considered is unsupervised mining of prediction rules in discretized time series. Such prediction rules describe relations between consecutive patterns in the discretized time series; that is, the prediction rules state that if the first pattern occurs, the second pattern will, with high probability, follow within a fixed number of symbols. The goal is to automatically extract prediction rules that are accurate, comprehensible, and interesting. The second pattern mining problem considered is supervised learning of classifiers that predict whether or not a given string belongs to a specific class of strings. This binary classification problem is very general, but this thesis focuses on two recent problems from molecular biology: i) predicting the efficacy of short interfering RNAs and antisense oligonucleotides; and ii) predicting whether or not a given DNA sequence is a non-coding RNA gene. The thesis describes a genetic programming based mining algorithm that produce state-of-the-art classifiers on both problems.
Has partsHalaas, A.; Svingen, B.; Nedland, M.; Sætrom, Pål; Snøve Jr., Ola; Birkeland, Olaf R.. A recursive MISD architecture for pattern matching. IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. 12(7): 727-734, 2004.
Sætrom, Pål; Hetland, Magnus Lie. Unsupervised temporal rule mining with genetic programming and specialized hardware. Proceedings of the International Conferance on Machine Learning and Applications (ICMLA' 03): 145-151, 2003.
Sætrom, Pål; Hetland, Magnus Lie. Multiobjective evolution of temporal rules. Eight Scandinavian Conferance on artifical Intelligence, 2003.
Hetland, Magnus Lie; Sætrom, Pål. Evolutionary Rule Mining in Time Series Databases. Machine Learning. 58(2-3): 107-125, 2005.
Sætrom, Pål. Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming. Bioinformatics. 20(17): 3055-3063, 2004.
Sætrom, Pål; Snøve Jr., Ola. A comparison of SiRNA efficacy predictors. Biochemical and Biophysical Research Communications. 321(1): 247-253, 2004.
Sætrom, Pål; Sneve, R.; Kristiansen, Knut I.; Snøve Jr., Ola; Grünfeld, T.; Rognes, T.; Seeberg, E.. Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming. Nucleic Acids Research. 33(10): 3263-3270, 2005.