Utilizing linguistic analysis in multiple source search engines
MetadataVis full innførsel
Modern search engines have several data sources available to users, e.g. Newssearch, Image search and Video search. When a user enters a query in a searchengine, it is up to the user to choose a different source than the normal web search.On average, a user will only consider the first few occurrences in a search result anddo so in a few seconds. It would therefore be beneficial to the user experienceif the user did not have to limit the sources manually to refine a search.This project will evaluate different machine learning methods to classify relevantsources to a query. The goal of this is having an automated learning system thattakes some labeled input and uses this to help inform or direct the user to therelevant source.The project will take advantage of a Yahoo! product; Yahoo! Query LinguistAnalysis Service (abbreviated QLAS from now on and through the document). Thegoal is to incorporate semantic data from QLAS into the learning system. Thisshould augment the amount of information available to the learning system, andimprove its performance. It is not clear how this semantic data could be combinedwith the training data and incorporated in the learning system. A substantial partof the project will be to explore this.This project was done in cooperation with Yahoo! Technologies Norway AS (YTN).YTN develops Vespa, a search engine platform that has the possibility to searchfrom multiple sources. YTN is interested in researching the field of learning sourcerelevance to improve the search experience in Yahoo services. YTN is also interestedin researching ways data from QLAS could be used by Vespa to enable sourcerelevance classification when Vespa is used in a multiple-index setup.