Neighborhood Mining in Biological Networks

2006

Biologists are constantly looking for new knowledge about biological properties

and processes. Bio-molecular interaction networks model dependencies

among proteins and the processes they participate. By studying

patterns of interaction in these networks, it may be possible to discover

implicit information embedded in the network topology. In this thesis we

improve existing and develop new methods for investigating similarities

between proteins, and for discovering protein interaction sub-patterns.

Cytoscape (Shannon et al., 2003) is a tool for visualization and analysis

of interaction networks used by biologists. We have developed an extension

to Cytoscape that lets biologists perform the following tasks:

- Compare proteins based on neighborhood information

- Find interaction sub pattern in an interaction network.

- Discover sub patterns in one or several networks.

Our main contributions are improvements to graph mining algorithms

gSpan by Yan and Han (2002) and Apriori by Inokuchi et al. (2003) whose

original task was the discovering of frequent sub-patterns in a very large

set of networks. We have enabled mining a single network and enabled

less exact matches.

The graph mining algorithm runs on labeled graphs, and we have used

various clustering techniques for this task. The clustering is done through

similarity measures between proteins, which we have based on Gene Ontology

annotations and experimental data obtained from a ChIP-chip experiment.

Our plug-in may easily be extended by adding other cluster

techniques or similarity measures.

We verify the results of our implementations and test them for speed.

We find that of the two mining algorithms gSpan shows the most promise

for mining biological graphs.

NTNU