Graph-Based Representations for Textual Case-Based Reasoning

Valle, Kjetil

dc.contributor.advisor	Öztürk, Pinar	nb_NO
dc.contributor.author	Valle, Kjetil	nb_NO
dc.date.accessioned	2014-12-19T13:37:17Z
dc.date.available	2014-12-19T13:37:17Z
dc.date.created	2011-09-13	nb_NO
dc.date.issued	2011	nb_NO
dc.identifier	440510	nb_NO
dc.identifier	ntnudaim:5757	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/252468
dc.description.abstract	This thesis presents a graph-based approach to the problem of text representation. The work is motivated by the need for better representations for use in textual Case-Based Reasoning (CBR). In CBR new problems are solved by reasoning based on similar past problem cases. When the cases are represented in free text format, measuring the similarity between a new problem and previously solved problems become a challenging task. The case documents need to be re-represented before they can be compared/matched.Textual CBR (TCBR) addresses this issue. We investigate automatic re-representation of textual cases, in particular measuring the salience of features (entities in the text) towards this end. We use the classical vector space model in Information Retrieval (IR) but investigate whether graph-representation and salience inference using graphs can improve on the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) measures, emph{bag of words} approaches predominant in IR.Our special focus is whether, and possibly how, the co-occurrence and the syntactic dependency relations between terms have an impact on feature weighting. We measure salience through the notion of graph centrality. We experiment with two types of application tasks, classification and case retrieval. Although classification is not a typical TCBR task, it is easier to find datasets for this application, and the centrality measures we have studied are not specific to TCBR. The experiments on this task are therefore relevant to the second application task which is our ultimate target. We test various centrality metrics described in the literature, make a distinction between local and global weighting measures and compare them for both application tasks. In general, our graph-based salience inference methods perform better than TF and TF-IDF.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim:5757	no_NO
dc.subject	MTDT datateknikk	no_NO
dc.subject	Intelligente systemer	no_NO
dc.title	Graph-Based Representations for Textual Case-Based Reasoning	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	143	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Files in this item

Name:: 440510_FULLTEXT01.pdf
Size:: 1.706Mb
Format:: PDF

View/Open

Name:: 440510_COVER01.pdf
Size:: 46.89Kb
Format:: PDF

View/Open

Name:: 440510_ATTACHMENT01.zip
Size:: 34.79Mb
Format:: Unknown

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6544]

Show simple item record