Vis enkel innførsel

dc.contributor.advisorNørvåg, Kjetilnb_NO
dc.contributor.authorGalåen, Magnusnb_NO
dc.date.accessioned2014-12-19T13:32:09Z
dc.date.available2014-12-19T13:32:09Z
dc.date.created2010-09-03nb_NO
dc.date.issued2008nb_NO
dc.identifier347608nb_NO
dc.identifierntnudaim:1505nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/250628
dc.description.abstractAs document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectMIT informatikkno_NO
dc.subjectKunstig intelligens og læringno_NO
dc.titleDokument-klynging (document clustering)nb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber84nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel