ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets

Rydbeck, Halfdan; Sandve, Geir Kjetil F.; Ferkingstad, Egil; Simovski, Boris; Rye, Morten Beck; Hovig, Johannes Eivind

dc.contributor.author	Rydbeck, Halfdan
dc.contributor.author	Sandve, Geir Kjetil F.
dc.contributor.author	Ferkingstad, Egil
dc.contributor.author	Simovski, Boris
dc.contributor.author	Rye, Morten Beck
dc.contributor.author	Hovig, Johannes Eivind
dc.date.accessioned	2015-11-25T08:10:54Z
dc.date.accessioned	2016-01-13T09:43:35Z
dc.date.available	2015-11-25T08:10:54Z
dc.date.available	2016-01-13T09:43:35Z
dc.date.issued	2015
dc.identifier.citation	PLoS ONE 2015, 10(4)	nb_NO
dc.identifier.issn	1932-6203
dc.identifier.uri	http://hdl.handle.net/11250/2373562
dc.description.abstract	Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Public Library of Science	nb_NO
dc.title	ClusTrack: Feature extraction and similarity measures for clustering of genome-wide data sets	nb_NO
dc.type	Journal article	nb_NO
dc.date.updated	2015-11-25T08:10:54Z
dc.source.volume	10	nb_NO
dc.source.journal	PLoS ONE	nb_NO
dc.source.issue	4	nb_NO
dc.identifier.doi	10.1371/journal.pone.0123261
dc.identifier.cristin	1248244
dc.relation.project	Norges forskningsråd: 213921	nb_NO
dc.description.localcode	© 2015 Rydbeck et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.	nb_NO

Tilhørende fil(er)

Filnavn:: fetchObject.pdf
Størrelse:: 340.9Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for klinisk og molekylær medisin [3426]
Publikasjoner fra CRIStin - NTNU [37221]

Vis enkel innførsel