The impact of deep learning on document classification using semantically rich representations

Kastrati, Zenun; Imran, Ali Shariq; Yildirim Yayilgan, Sule

dc.contributor.author	Kastrati, Zenun
dc.contributor.author	Imran, Ali Shariq
dc.contributor.author	Yildirim Yayilgan, Sule
dc.date.accessioned	2020-01-15T13:48:02Z
dc.date.available	2020-01-15T13:48:02Z
dc.date.created	2019-05-07T12:47:50Z
dc.date.issued	2019
dc.identifier.citation	Information Processing & Management. 2019, 56 (5), 1618-1632.	nb_NO
dc.identifier.issn	0306-4573
dc.identifier.uri	http://hdl.handle.net/11250/2636466
dc.description.abstract	This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used. Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.	nb_NO
dc.description.abstract	The impact of deep learning on document classification using semantically rich representations	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	Elsevier	nb_NO
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.no	*
dc.title	The impact of deep learning on document classification using semantically rich representations	nb_NO
dc.type	Journal article	nb_NO
dc.type	Peer reviewed	nb_NO
dc.description.version	acceptedVersion	nb_NO
dc.source.pagenumber	1618-1632	nb_NO
dc.source.volume	56	nb_NO
dc.source.journal	Information Processing & Management	nb_NO
dc.source.issue	5	nb_NO
dc.identifier.doi	10.1016/j.ipm.2019.05.003
dc.identifier.cristin	1696042
dc.description.localcode	© 2019. This is the authors’ accepted and refereed manuscript to the article. Locked until 15.5.2021 due to copyright restrictions. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/	nb_NO
cristin.unitcode	194,0,0,0
cristin.unitcode	194,63,10,0
cristin.unitcode	194,63,30,0
cristin.unitname	Norges teknisk-naturvitenskapelige universitet
cristin.unitname	Institutt for datateknologi og informatikk
cristin.unitname	Institutt for informasjonssikkerhet og kommunikasjonsteknologi
cristin.ispublished	true
cristin.fulltext	postprint
cristin.qualitycode	2

Tilhørende fil(er)

Filnavn:: The+impact+of+deep+learning+on+.pdf
Størrelse:: 1.195Mb
Format:: PDF
Beskrivelse:: Kastrati

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6772]
Institutt for informasjonssikkerhet og kommunikasjonsteknologi [2578]
Publikasjoner fra CRIStin - NTNU [38070]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution-NonCommercial-NoDerivatives 4.0 Internasjonal