Dokument-klynging (document clustering)

Galåen, Magnus

dc.contributor.advisor	Nørvåg, Kjetil	nb_NO
dc.contributor.author	Galåen, Magnus	nb_NO
dc.date.accessioned	2014-12-19T13:32:09Z
dc.date.available	2014-12-19T13:32:09Z
dc.date.created	2010-09-03	nb_NO
dc.date.issued	2008	nb_NO
dc.identifier	347608	nb_NO
dc.identifier	ntnudaim:1505	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/250628
dc.description.abstract	As document searching becomes more and more important with the rapid growth of document bases today, document clustering also becomes more important. Some of the most commonly used document clustering algorithms today, are pure statistical in nature. Other algorithms have emerged, adressing some of the issues with numerical algorithms, claiming to be better. This thesis compares two well-known algorithms: Elliptic K-Means and Suffix Tree Clustering. They are compared in speed and quality, and it is shown that Elliptic K-Means performs better in speed, while Suffix Tree Clustering (STC) performs better in quality. It is further shown that STC performs better using small portions of relevant text (snippets) on real web-data compared to the full document. It is also shown that a threshold value for base cluster merging is unneccesary. As STC is shown to perform adequately in speed when running on snippets only, it is concluded that STC is the better algorithm for the purpose of search results clustering.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim	no_NO
dc.subject	MIT informatikk	no_NO
dc.subject	Kunstig intelligens og læring	no_NO
dc.title	Dokument-klynging (document clustering)	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	84	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 347608_FULLTEXT01.pdf
Størrelse:: 1.056Mb
Format:: PDF

Åpne

Filnavn:: 347608_ATTACHMENT01.zip
Størrelse:: 735.1Kb
Format:: Ukjent

Åpne

Filnavn:: 347608_COVER01.pdf
Størrelse:: 108.0Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6544]

Vis enkel innførsel