Improving Document Classification Using Ontologies

Kastrati, Zenun

dc.contributor.advisor	Yildirim, Sule
dc.contributor.advisor	Hjelsvold, Rune
dc.contributor.author	Kastrati, Zenun
dc.date.accessioned	2018-02-23T11:59:58Z
dc.date.available	2018-02-23T11:59:58Z
dc.date.issued	2018
dc.identifier.isbn	978-82-326-2881-0
dc.identifier.issn	1503-8181
dc.identifier.uri	http://hdl.handle.net/11250/2486733
dc.description.abstract	We are living in the age of internet where massive amount of information is produced from various digital resources on daily basis. The information of these resources is typically stored in unstructured textual format such as reports, news, e-mails, blogs, etc., therefore, a proper classification and organization of this huge amount of information is apparently needed. In this regard, an automatic classification, particularly ontology-based classification, plays an important role in helping people to classify and organize the information accordingly. The ontology-based classification system is an automatic system that utilizes the ontology in order to take advantages of organizing and classifying the knowledge in a more structural and formal way, thus providing better classification accuracy comparing to the traditional keyword-based classification system. The performance of an ontology-based document classification system can be affected by several aspects involved in the entire classification process that generally is constituted of steps such as document collection and preprocessing, document representation, dimensionality reduction, and the classifier. It is almost impossible to address all these research aspects in order to obtain performance improvement in a single dissertation research work, therefore we selected to work on the aspects that we consider are either rarely studied or have a crucial role on the ontology-based classification system. Document representation is one of the main aspects that affects the performance of ontology-based document classification, thus the first research aspect that we investigated is enriching document representation with semantics utilizing the background knowledge exploited by ontologies. The background knowledge derived from an ontology is embedded in a document using a matching technique. The idea behind this technique is mapping of terms that occur in a document with the relevant ontology concepts by searching only the presence of concepts labels in that document. Searching only the presence of concepts labels occurring in a document limits the capabilities of the classification system to capture and exploit the entire conceptualization involved in that document due to the semantic gap issue, the lack of an in depth-coverage of concepts, and the ambiguity problem. In this thesis, the focus is placed on the conceptual document representation, in which, a document is associated with a set of concepts not only by looking for the appearance of concept labels, but also through the acquisition of lexical information integrated (linked) to the ontology to enriching its coverage with new concepts. In this respect, an automatic ontology concept enrichment model is developed to enrich ontologies with new concepts in order to provide a broader coverage for document representation. The proposed model explores textual data and relies on semantic and contextual information of terms occurring in a discourse. The performance of ontology-based document classification is highly dependent on the relevance of concepts that is indicated by weights. The weights reflect the discriminative power of concepts with respect to the documents and are typically computed through the frequency of occurrences of concepts in these documents. Thus, the second research aspect that we studied in this research work is enhancing the existing concept weighting scheme by introducing the notion of concept importance. Concept importance assesses the contribution of a concept in discriminating between documents depending on its position in the ontology hierarchy. In addition, we explored the possibilities to automatically evaluate the concept importance and a Markov-based approach is developed. Further, we aggregated concept importance and concept relevance in order to enhance the concept weighting scheme and thus to improve the concept vector space representation model. Lastly, the third research aspect studied in this dissertation is related to improving classification accuracy by taking the advantages of the ontology enrichment model, and the enhanced concept weighting scheme developed while studying the first and the second research aspect respectively. We proposed a document classification approach that relies on an ontology whose coverage is widen using the ontology enrichment model SEMCON and the weights of concepts are assessed through the new concept weighting technique composed of concept relevance and concept importance. Extensive experimental results demonstrated a considerable improvement of the classification effectiveness.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	NTNU	nb_NO
dc.relation.ispartofseries	Doctoral theses at NTNU;2018:44
dc.relation.haspart	Paper 1: Kastrati, Zenun; Yayilgan, Sule; Imran, Ali Shariq. SEMCON: Semantic and contextual objective metric. I: Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing. s. 65-68. http://doi.org/10.1109/ICOSC.2015.7050779 © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	nb_NO
dc.relation.haspart	Paper 2: Kastrati, Zenun; Imran, Ali Shariq; Yildirim, Sule. SEMCON: A semantic and contextual objective metric for enriching domain ontology concepts. International Journal on Semantic Web and Information Systems 2016 ;Volum 12.(2) s. 1-24. http://doi.org/10.4018/IJSWIS.2016040101	nb_NO
dc.relation.haspart	Paper 3: Kastrati, Zenun; Imran, Ali Shariq; Yayilgan, Sule; Dalipi, Fisnik. Analysis of Online Social Networks Posts to Investigate Suspects Using SEMCON. I: Social Computing and Social Media 7th International Conference, SCSM 2015, Held as Part of HCI International 2015, Los Angeles, CA, USA, August 2-7, 2015, Proceedings. Springer 2015. s. 148-157. http://doi.org/10.1007/978-3-319-20367-6_16	nb_NO
dc.relation.haspart	Paper 4: Kastrati, Zenun; Imran, Ali Shariq. Adaptive Concept Vector Space Representation Using Markov Chain Model. I: Knowledge Engineering and Knowledge Management. Springer 2014. s. 203-208. http://doi.org/10.1007/978-3-319-13704-9_16	nb_NO
dc.relation.haspart	Paper 5: Kastrati, Zenun; Imran, Ali Shariq; Yildirim, Sule. An Improved Concept Vector Space Model for Ontology Based Classification. I: SITIS 2015 - The 11th International Conference on Signal Image Technology & Internet System. IEEE 2015. s. 240-245. http://doi.org/10.1109/SITIS.2015.102 © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	nb_NO
dc.relation.haspart	Paper 6: Kastrati, Zenun; Yildirim Yayilgan, Sule; Hjesvold, Rune. Automatically Enriching Domain Ontologies for Document Classification. I: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, WIMS 2016. Association for Computing Machinery (ACM) 2016 s. 1-4. © ACM, 2016 This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive published version http://doi.org/10.1145/2912845.2912875	nb_NO
dc.relation.haspart	Paper 7: Kastrati, Zenun; Yildirim, Sule.Supervised Ontology-Based Document Classification Model. I: Proceeding: ICCDA '17 Proceedings of the International Conference on Compute and Data Analysis. ACM Publications 2017. s. 245-251. © ACM, 2017. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published http://doi.org/10.1145/3093241.3107883	nb_NO
dc.relation.haspart	Paper 8: Kastrati, Zenun; Imran, Ali Shariq; Yildirim, Sule. A Hybrid Concept Learning Approach to Ontology Enrichment. I: Innovations, Developments, and Applications of Semantic Web and Information Systems. IGI Global 2018. s. 85-119. http://doi.org/10.4018/978-1-5225-5042-6.ch004	nb_NO
dc.title	Improving Document Classification Using Ontologies	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.subject.nsi	VDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550	nb_NO

Tilhørende fil(er)

Filnavn:: Zenun Kastrati_PhD.pdf
Størrelse:: 32.87Mb
Format:: PDF
Beskrivelse:: Full text PDF available

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6552]

Vis enkel innførsel