Iterative sub-network component analysis enables reconstruction of large scale genetic networks
Journal article, Peer reviewed
Permanent lenke
http://hdl.handle.net/11250/2364441Utgivelsesdato
2015Metadata
Vis full innførselSamlinger
Sammendrag
Background: Network component analysis (NCA) became a popular tool to understand complex regulatory
networks. The method uses high-throughput gene expression data and a priori topology to reconstruct transcription
factor activity profiles. Current NCA algorithms are constrained by several conditions posed on the network topology,
to guarantee unique reconstruction (termed compliancy). However, the restrictions these conditions pose are not
necessarily true from biological perspective and they force network size reduction, pruning potentially important
components.
Results: To address this, we developed a novel, Iterative Sub-Network Component Analysis (ISNCA) for reconstructing
networks at any size. By dividing the initial network into smaller, compliant subnetworks, the algorithm first predicts
the reconstruction of each subntework using standard NCA algorithms. It then subtracts from the reconstruction the
contribution of the shared components from the other subnetwork. We tested the ISNCA on real, large datasets using
various NCA algorithms. The size of the networks we tested and the accuracy of the reconstruction increased
significantly. Importantly, FOXA1, ATF2, ATF3 and many other known key regulators in breast cancer could not be
incorporated by any NCA algorithm because of the necessary conditions. However, their temporal activities could be
reconstructed by our algorithm, and therefore their involvement in breast cancer could be analyzed.
Conclusions: Our framework enables reconstruction of large gene expression data networks, without reducing their
size or pruning potentially important components, and at the same time rendering the results more biological
plausible. Our ISNCA method is not only suitable for prediction of key regulators in cancer studies, but it can be
applied to any high-throughput gene expression data.