Neighborhood Mining in Biological Networks
Abstract
Biologists are constantly looking for new knowledge about biological propertiesand processes. Bio-molecular interaction networks model dependenciesamong proteins and the processes they participate. By studyingpatterns of interaction in these networks, it may be possible to discoverimplicit information embedded in the network topology. In this thesis weimprove existing and develop new methods for investigating similaritiesbetween proteins, and for discovering protein interaction sub-patterns.
Cytoscape (Shannon et al., 2003) is a tool for visualization and analysisof interaction networks used by biologists. We have developed an extensionto Cytoscape that lets biologists perform the following tasks:- Compare proteins based on neighborhood information- Find interaction sub pattern in an interaction network.- Discover sub patterns in one or several networks.
Our main contributions are improvements to graph mining algorithmsgSpan by Yan and Han (2002) and Apriori by Inokuchi et al. (2003) whoseoriginal task was the discovering of frequent sub-patterns in a very largeset of networks. We have enabled mining a single network and enabledless exact matches.
The graph mining algorithm runs on labeled graphs, and we have usedvarious clustering techniques for this task. The clustering is done throughsimilarity measures between proteins, which we have based on Gene Ontologyannotations and experimental data obtained from a ChIP-chip experiment.Our plug-in may easily be extended by adding other clustertechniques or similarity measures.
We verify the results of our implementations and test them for speed.We find that of the two mining algorithms gSpan shows the most promisefor mining biological graphs.