Separating pseudo-microRNAs from true microRNAs
MetadataVis full innførsel
MicroRNAs are small RNA molecules that regulate gene expression in cells. They are derived from hairpin shaped RNA transcripts, and about 50 \% of microRNA genes are localized in genomic regions that are associated with cancer. There are numerous other natural occurring RNA molecules that also take shape as hairpins. Being able to distinguish between these molecules and real microRNAs is vital to understand the nature of microRNAs.The goal of this thesis has been to construct a classifier that based on existing features is able to predict whether a hairpin shaped RNA molecule is a microRNA or a pseudo-microRNA. In addition the features in use have been analyzed to see which of these features are the most important for Microprocessor processing, and microRNA classification.I present a classifier that is able to distinguish between real and pseudo-microRNAs with high certainty for mus musculus microRNAs. This classifier is based on feature information constructed from the output of another classifier that predicts the Microprocessor cut site of microRNAs. The features used by this classifier have been analyzed using feature elimination. Indications show that there are specific positions within the flanking regions of a microRNA substrate that are important for Drosha recognition of the substrate. Feature analysis has also been performed for the microRNA classifier, and discoveries were made that indicate that microRNAs can be distinguished from other hairpin RNAs by the fact that microRNAs have one clear cut site candidate where the other hairpin shaped RNAs might have many possible candidates. This information will hopefully further assists the search for novel microRNAs, and also to help reanalyze existing microRNAs to verify that they are in fact microRNAs.