ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification

Lin, Chengchuang; Chen, Hanbiao; Huang, Jiesheng; Peng, Jing; Guo, Li; Yang, Zhirong; Du, Jiahua; Li, Shuangyin; Yin, Aihua; Zhao, Gansen

dc.contributor.author	Lin, Chengchuang
dc.contributor.author	Chen, Hanbiao
dc.contributor.author	Huang, Jiesheng
dc.contributor.author	Peng, Jing
dc.contributor.author	Guo, Li
dc.contributor.author	Yang, Zhirong
dc.contributor.author	Du, Jiahua
dc.contributor.author	Li, Shuangyin
dc.contributor.author	Yin, Aihua
dc.contributor.author	Zhao, Gansen
dc.date.accessioned	2022-12-30T08:31:42Z
dc.date.available	2022-12-30T08:31:42Z
dc.date.created	2022-07-18T14:58:41Z
dc.date.issued	2022
dc.identifier.issn	1476-9271
dc.identifier.uri	https://hdl.handle.net/11250/3040011
dc.description.abstract	Chromosome karyotyping analysis is a vital cytogenetics technique for diagnosing genetic and congenital malformations, analyzing gestational and implantation failures, etc. Since the chromosome classification as an essential stage in chromosome karyotype analysis is a highly time-consuming, tedious, and error-prone task, which requires a large amount of manual work of experienced cytogenetics experts. Many deep learning-based methods have been proposed to address the chromosome classification issues. However, two challenges still remain in current chromosome classification methods. First, most existing methods were developed by different private datasets, making these methods difficult to compare with each other on the same base. Second, due to the absence of reproducing details of most existing methods, these methods are difficult to be applied in clinical chromosome classification applications widely. To address the above challenges in the chromosome classification issue, this work builds and publishes a massive clinical dataset. This dataset enables the benchmarking and building chromosome classification baselines suitable for different scenarios. The massive clinical dataset consists of 126,453 privacy preserving G-band chromosome instances from 2763 karyotypes of 408 individuals. To our best knowledge, it is the first work to collect, annotate, and release a publicly available clinical chromosome classification dataset whose data size scale is also over 120,000. Meanwhile, the experimental results show that the proposed dataset can boost performance of existing chromosome classification models at a varied range of degrees, with the highest accuracy improvement by 5.39 % points. Moreover, the best baseline with 99.33 % accuracy reports state-of-the-art classification performance. The clinical dataset and state-of-the-art baselines can be found at https://github.com/CloudDataLab/BenchmarkForChromosomeClassification.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.title	ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification	en_US
dc.title.alternative	ChromosomeNet: A massive dataset enabling benchmarking and building basedlines of clinical chromosome classification	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	acceptedVersion	en_US
dc.source.journal	Computational biology and chemistry	en_US
dc.identifier.doi	https://doi.org/10.1016/j.compbiolchem.2022.107731
dc.identifier.cristin	2038673
cristin.ispublished	false
cristin.fulltext	postprint
cristin.qualitycode	1

Tilhørende fil(er)

Filnavn:: 1-s2.0-S1476927122001116-main.pdf
Størrelse:: 1.420Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6568]
Publikasjoner fra CRIStin - NTNU [37384]

Vis enkel innførsel