Iterative sub-network component analysis enables reconstruction of large scale genetic networks
Journal article, Peer reviewed
MetadataShow full item record
Original versionBMC Bioinformatics 2015, 16 10.1186/s12859-015-0768-9
Background: Network component analysis (NCA) became a popular tool to understand complex regulatory networks. The method uses high-throughput gene expression data and a priori topology to reconstruct transcription factor activity profiles. Current NCA algorithms are constrained by several conditions posed on the network topology, to guarantee unique reconstruction (termed compliancy). However, the restrictions these conditions pose are not necessarily true from biological perspective and they force network size reduction, pruning potentially important components. Results: To address this, we developed a novel, Iterative Sub-Network Component Analysis (ISNCA) for reconstructing networks at any size. By dividing the initial network into smaller, compliant subnetworks, the algorithm first predicts the reconstruction of each subntework using standard NCA algorithms. It then subtracts from the reconstruction the contribution of the shared components from the other subnetwork. We tested the ISNCA on real, large datasets using various NCA algorithms. The size of the networks we tested and the accuracy of the reconstruction increased significantly. Importantly, FOXA1, ATF2, ATF3 and many other known key regulators in breast cancer could not be incorporated by any NCA algorithm because of the necessary conditions. However, their temporal activities could be reconstructed by our algorithm, and therefore their involvement in breast cancer could be analyzed. Conclusions: Our framework enables reconstruction of large gene expression data networks, without reducing their size or pruning potentially important components, and at the same time rendering the results more biological plausible. Our ISNCA method is not only suitable for prediction of key regulators in cancer studies, but it can be applied to any high-throughput gene expression data.