RNA motif discovery: a computational overview
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3 ′ or 5 ′ untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.