Show simple item record

dc.contributor.advisorNørvåg, Kjetilnb_NO
dc.contributor.advisorAuran, Per Gunnarnb_NO
dc.contributor.authorEriksen, Trond Øivindnb_NO
dc.contributor.authorKorsen, Anne Sirinb_NO
dc.date.accessioned2014-12-19T13:34:23Z
dc.date.available2014-12-19T13:34:23Z
dc.date.created2010-09-05nb_NO
dc.date.issued2006nb_NO
dc.identifier349039nb_NO
dc.identifierntnudaim:1280nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/251510
dc.description.abstractInformation retrieval is concerned with extraction of documents from a collection, according to the user's information need. The ranking returned by a search engine is determined by the relevance function in use. The amount of information stored digitally and being searched for on the Web, grows every day. As the document bases grow, relevance has never been more important. There is a trend towards domain-specific search solutions, vertical search services, in the case of searching the Web. A vertical search service utilise semi-structured documents, i.e. documents which contain metadata describing the content. Semi-structured information retrieval is a hybrid between traditional information retrieval based on unstructured documents, and database retrieval based on structured content. Semi-structured documents imply the use of multiple criteria for how the returned documents should be ranked. This in turn arises questions like which criterion that is more important, and how to combine the results produced by the different criteria. This thesis addresses these challenges. We have studied relevance techniques for the purpose of identifying an approach to improving the perceived relevance at the Yahoo! vertical search platform, Vespa. In particular, Yahoo! Shopping has been the focus during problem elaboration, implementation, and evalution. A plug-in is implemented in Vespa, providing a generic and flexible framework for hybrid search. Our solution allows for context queries, i.e. queries that include terms that describe the desired context, with no specific knowledge about the query language or document structure needed. Also, keyword and context terms in a query is treated differently, using the context terms only for focusing the search. 5 experiments have been performed to test our proposed solution. The results indicate that: - A considerable improvement in retrieval performance is achieved for context queries. Much of the improvement is obtained by removing noisy hits from the result. - The solution performs almost similar as the standard approach for non-context queries. However, these queries will suffer from a higher latency. The latency depends on the complexity of the domain. Most search engines today either return thousands of answers to a user query, or, in about 20% of the cases, none. Our solution may provide as a solution to these challenges and thus improve the perceived relevance. It should be noted that the solution requires a reasonable labelling of the documents, in addition to training of the users in order to make them use context words in their queries. The preliminary experiment results are positive, but are influenced by a reference collection somewhat adapted to our solution, and should therefore be complemented with experiments based on a full system implementation and a well-defined reference collection. The first step is to choose an appropriate labelling scheme for how the semantics of the documents and queries should be captured. Next, it would be interesting to experiment with the ranking of the results. Finally, the user interface should be extended in order to guide the user when submitting context queries.nb_NO
dc.languageengnb_NO
dc.publisherInstitutt for datateknikk og informasjonsvitenskapnb_NO
dc.subjectntnudaimno_NO
dc.subjectSIF2 datateknikkno_NO
dc.subjectProgram- og informasjonssystemerno_NO
dc.subjectKomplekse datasystemerno_NO
dc.titleA generic and flexible Framework for focusing Search at Yahoo! Shoppingnb_NO
dc.typeMaster thesisnb_NO
dc.source.pagenumber120nb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record