dc.description.abstract | Most prediction methods for finding potential DNA binding sites for a specific transcription factor (TF) use a model for the transcription factor binding site (TFBS), and compare each position of the DNA sequence (e.g. a genome) against this model. Any position with a significant score against the model may then be classified as a potential binding site.Common models are e.g. consensus sequence, HMM and PWM.
The main problem with this approach is that it generates a large number of false positive TFBS predictions. It has actually been estimated that in most cases the estimate will be completely dominated by false positives.
This project will try to develop a context-sensitive approach for identification of real binding sites for a given TF, independent of cell type.The basic assumption in this project is that real TFBSs are found in a suitable genomic context, whereas random binding sites will lack any common context. The idea is then to use properties that somehow can be associated with regulatory regions to develop a classifier for PWM-based TFBS predictions. And using machine learning approach the classifier will (hopefully) remove most false positive TFBS predictions. | |