Authoritative K-Means for Clustering of Web Search Results

He, Gaojie

dc.contributor.advisor	Nørvåg, Kjetil	nb_NO
dc.contributor.advisor	Neumayer, Robert	nb_NO
dc.contributor.author	He, Gaojie	nb_NO
dc.date.accessioned	2014-12-19T13:36:10Z
dc.date.available	2014-12-19T13:36:10Z
dc.date.created	2010-10-13	nb_NO
dc.date.issued	2010	nb_NO
dc.identifier	356722	nb_NO
dc.identifier	ntnudaim:5534	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/252220
dc.description.abstract	Clustering is currently more and more applied on hyperlinked documents, especially for web search results. Although most commercial web search engines will provide their ranking algorithms sorting the matched results to raise the most relevant pages to the top, the size of results is still so huge that most ones including some pages that suffers are really interested in will be discarded. Clustering for web search results separates unrelated pages and clusters the similar pages with the same topic into the same group, thus helps suffers to locate the pages much faster. Many features of web pages have been studied to be used in clustering, such as content information including title, snippet, anchor text and etc. Hyperlink is another primary feature of web pages, some content-link coupled clustering methods have been studied. We propose an authoritative K-Means clustering method that combines content, in-link, out-link and page rank. In this project, we adjust the construction of in-link and out-link vectors and introduce a new page rank vector with two patterns, one is a single value representation of page rank and the other is a 11-dimensional vector. We study the difference of these two types of page rank in clustering, and compare the different clustering based on different web page representations, such as content-based, content-link coupled and etc. The effect of different elements of web page is also studied in our project. We apply the authoritative clustering for the web search results retrieved from Google search engine. Three experiments are conducted and different evaluation metrics are adopted to analyze the results.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.subject	ntnudaim:5534	no_NO
dc.subject	MSINFOSYST Master in Information Systems	no_NO
dc.subject	Information Systems	no_NO
dc.title	Authoritative K-Means for Clustering of Web Search Results	nb_NO
dc.type	Master thesis	nb_NO
dc.source.pagenumber	81	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO

Tilhørende fil(er)

Filnavn:: 356722_FULLTEXT01.pdf
Størrelse:: 1.136Mb
Format:: PDF

Åpne

Filnavn:: 356722_COVER01.pdf
Størrelse:: 48.10Kb
Format:: PDF

Åpne

Filnavn:: 356722_ATTACHMENT01.zip
Størrelse:: 105.0Kb
Format:: Ukjent

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6708]

Vis enkel innførsel