Vis enkel innførsel

dc.contributor.advisorNørvåg, Kjetilnb_NO
dc.contributor.advisorNeumayer, Robertnb_NO
dc.contributor.authorHe, Gaojienb_NO
dc.date.accessioned2014-12-19T13:36:10Z
dc.date.available2014-12-19T13:36:10Z
dc.date.created2010-10-13nb_NO
dc.date.issued2010nb_NO
dc.identifier356722nb_NO
dc.identifierntnudaim:5534nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/252220
dc.description.abstractClustering is currently more and more applied on hyperlinked documents, especially for web search results. Although most commercial web search engines will provide their ranking algorithms sorting the matched results to raise the most relevant pages to the top, the size of results is still so huge that most ones including some pages that suffers are really interested in will be discarded. Clustering for web search results separates unrelated pages and clusters the similar pages with the same topic into the same group, thus helps suffers to locate the pages much faster. Many features of web pages have been studied to be used in clustering, such as content information including title, snippet, anchor text and etc. Hyperlink is another primary feature of web pages, some content-link coupled clustering methods have been studied. We propose an authoritative K-Means clustering method that combines content, in-link, out-link and page rank. In this project, we adjust the construction of in-link and out-link vectors and introduce a new page rank vector with two patterns, one is a single value representation of page rank and the other is a 11-dimensional vector. We study the difference of these two types of page rank in clustering, and compare the different clustering based on different web page representations, such as content-based, content-link coupled and etc. The effect of different elements of web page is also studied in our project. We apply the authoritative clustering for the web search results retrieved from Google search engine. Three experiments are conducted and different evaluation metrics are adopted to analyze the results.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaim:5534no_NO
dc.subjectMSINFOSYST Master in Information Systemsno_NO
dc.subjectInformation Systemsno_NO
dc.titleAuthoritative K-Means for Clustering of Web Search Resultsnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber81nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel