Vis enkel innførsel

dc.contributor.advisorNørvåg, Kjetil
dc.contributor.authorRoligheten, Christian Barth
dc.date.accessioned2018-10-16T14:00:26Z
dc.date.available2018-10-16T14:00:26Z
dc.date.created2018-06-11
dc.date.issued2018
dc.identifierntnudaim:19146
dc.identifier.urihttp://hdl.handle.net/11250/2568322
dc.description.abstractKeeping knowledge bases such as Wikipedia up-to-date with the latest information is a difficult task in the information age: Every day thousands of news articles, blog posts, opinions are published on the Internet and if we imagine that just a small fraction of these documents contain new information that would require a knowledge base to be updated, then we need an army of constantly vigilant volunteers to keep track of this stream of information and update knowledge bases as it becomes necessary. Obviously as more more information is generated on the Internet, we need increasingly more volunteers to keep track of it all. It would then be greatly beneficial if we could create automated systems which assist volunteers with integrating new information into knowledge bases. Cumulative Citation Recommendation (CCR) is the task of assisting knowledge base editors by automatically recommending edits to entity profiles in knowledge bases given a stream of documents. In this thesis we implement a CCR system that allow us to evaluate different learning-to-rank (LTR) based ranking approaches to CCR. Specifically we compare entity-dependent and entity-independent approaches, as well as approaches which use Gradient Boosted Trees and Random Forests as the ranking algorithm. We also evaluate how different features affect the system. Our best approach which uses Gradient Boosted Trees and an entity-dependent approach achieves an F1 measure of 0.5 on the 2014 TREC KBA track, which would places it in second place compared to other participants of this track. Our evaluation of different LTR-based approaches reveal which approaches are most effective for CCR.
dc.languageeng
dc.publisherNTNU
dc.subjectDatateknologi, Databaser og søk
dc.titleCumulative Citation Recommendation
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel