Assessment of composite motif discovery methods
Journal article, Peer reviewed
Permanent lenke
http://hdl.handle.net/11250/2366791Utgivelsesdato
2008Metadata
Vis full innførselSamlinger
Sammendrag
Background: Computational discovery of regulatory elements is an important area of
bioinformatics research and more than a hundred motif discovery methods have been published.
Traditionally, most of these methods have addressed the problem of single motif discovery –
discovering binding motifs for individual transcription factors. In higher organisms, however,
transcription factors usually act in combination with nearby bound factors to induce specific
regulatory behaviours. Hence, recent focus has shifted from single motifs to the discovery of sets
of motifs bound by multiple cooperating transcription factors, so called composite motifs or cisregulatory
modules. Given the large number and diversity of methods available, independent
assessment of methods becomes important. Although there have been several benchmark studies
of single motif discovery, no similar studies have previously been conducted concerning composite
motif discovery.
Results: We have developed a benchmarking framework for composite motif discovery and used
it to evaluate the performance of eight published module discovery tools. Benchmark datasets were
constructed based on real genomic sequences containing experimentally verified regulatory
modules, and the module discovery programs were asked to predict both the locations of these
modules and to specify the single motifs involved. To aid the programs in their search, we provided
position weight matrices corresponding to the binding motifs of the transcription factors involved.
In addition, selections of decoy matrices were mixed with the genuine matrices on one dataset to
test the response of programs to varying levels of noise.
Conclusion: Although some of the methods tested tended to score somewhat better than others
overall, there were still large variations between individual datasets and no single method
performed consistently better than the rest in all situations. The variation in performance on
individual datasets also shows that the new benchmark datasets represents a suitable variety of
challenges to most methods for module discovery.