com.yahoo.shopping
Class XSearcher

java.lang.Object
  extended by com.yahoo.prelude.Searcher
      extended by com.yahoo.prelude.ChainedSearcher
          extended by com.yahoo.shopping.XSearcher

public class XSearcher
extends com.yahoo.prelude.ChainedSearcher

The XSearcher plug-in provides functionality to focus the search: - Allows for context queries, i.e. queries that include terms that describe the desired context, with no specific knowledge about the query language or document structure needed. - Keyword and context terms in a query is treated differently, using the context terms only for focusing the search. - Provides a generic and flexible framework for hybrid search.

Author:
Anne Siri Korsen & Trond Řivind Eriksen.

Nested Class Summary
private static class XSearcher.NraHitOrderer
          Sorts NraHit objects, firstby lower bound, then by upper bound.
 
Field Summary
private  java.util.HashMap<java.lang.String,java.lang.String> depMap
           
private  java.util.HashMap<java.lang.String,java.lang.String> firstNamesMap
           
private  java.util.TreeMap<java.lang.String,java.lang.String> invertedXmlDocumentTree
           
private  java.util.HashMap<java.lang.String,java.lang.String> ontology
           
private  java.util.HashMap<java.lang.String,java.lang.String> surNamesMap
           
private  java.util.HashMap<java.lang.String,java.lang.String> xmlDocumentTree
           
 
Constructor Summary
XSearcher()
          The constructor initiates the data structures.
 
Method Summary
private  java.lang.String[] analyseQuery(com.yahoo.prelude.Query query, boolean properName)
          The method recognises context terms and proper names.
private  void buildNamesMap(java.lang.String fileName)
          The method reads a file, and stores the value of each line in a HashMap.
 com.yahoo.prelude.Result doSearch(com.yahoo.prelude.Query query, int offset, int hits)
          This method is called each time a search is performed.
private  com.yahoo.prelude.Result doSuperSearch(com.yahoo.prelude.Query query, int offset, int hits, java.lang.String currentLabel, java.lang.String contextLabel, boolean removeContextWord)
          Modifies the query by calling modifyQuery(), and sends it to the next Searcher in the chain.
private  com.yahoo.prelude.Result doXmlSearch(com.yahoo.prelude.Query query, java.lang.String contextLabel, int properNameFound, int offset, int hits, int k, AggregationFunction aggrFunc)
          Traverses the XML structure from the root context node found in analyseQuery(), performs a search by calling doXSearch() for each leaf node, and merges the subresults by calling mergeResults().
private  com.yahoo.prelude.Result doXSearch(java.lang.String contextLabel, int properNameFound, int offset, int hits, java.util.ArrayList<com.yahoo.prelude.Result> subResults, com.yahoo.prelude.Query query, java.lang.String currentLabel)
          This method is a workaround, caused by inadequate labelling of parts of the document corpus.
private  com.yahoo.prelude.Result mergeResults(com.yahoo.prelude.Query query, java.util.ArrayList<com.yahoo.prelude.Result> subResults, int k, AggregationFunction aggrFunc)
          Merges the subresults by performing the No Random Access algorithm.
private  com.yahoo.prelude.query.Item modifyQuery(com.yahoo.prelude.query.Item item, java.lang.String currentLabel, java.lang.String contextLabel, boolean removeContextWord)
          The method traverses the query tree, and performs the following modifications: - Removal of a keyword recognised as a context term
 
Methods inherited from class com.yahoo.prelude.ChainedSearcher
addChained, doFill, doPing, getChained, getEditionTimeStamp, setChained
 
Methods inherited from class com.yahoo.prelude.Searcher
addChained, fill, getId, getLogger, getNeedOffsetChecking, initialize, isLoggingFine, search, setNeedOffsetChecking
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

xmlDocumentTree

private java.util.HashMap<java.lang.String,java.lang.String> xmlDocumentTree

ontology

private java.util.HashMap<java.lang.String,java.lang.String> ontology

invertedXmlDocumentTree

private java.util.TreeMap<java.lang.String,java.lang.String> invertedXmlDocumentTree

firstNamesMap

private java.util.HashMap<java.lang.String,java.lang.String> firstNamesMap

surNamesMap

private java.util.HashMap<java.lang.String,java.lang.String> surNamesMap

depMap

private java.util.HashMap<java.lang.String,java.lang.String> depMap
Constructor Detail

XSearcher

public XSearcher()
The constructor initiates the data structures. Three main data structures are used: - A hash map storing mappings from context terms to their Dewey numbers. - A hash map that acts as an ontology, storing mappings from synonyms or plural forms to the terms' base form. - A sorted hash map storing mappings from each Dewey number to its XML label. The first two mentioned hash maps are used during query analysis, in order to recognise context terms and express the desired search context by means of the Dewey number of its root node. The sorted hash map is used to traverse the part of the XML tree to be searched in, retrieving subresults, and merging these when appropriate. In addition, two hash maps are built storing lists of English first names and surnames.

Method Detail

doSearch

public com.yahoo.prelude.Result doSearch(com.yahoo.prelude.Query query,
                                         int offset,
                                         int hits)
This method is called each time a search is performed. It determines the values of the input parameters, and initiates query analysis and searching by subcalls to analyseQuery() and doXmlSearch().

Overrides:
doSearch in class com.yahoo.prelude.ChainedSearcher
Parameters:
query - Query as it is forwarded from the previous Searcher in the chain.
offset - The first hit number to retrieve.
hits - The number of hits to retrieve in each subresult.
Returns:
The final result returned to the previous Searcher in the chain.

buildNamesMap

private void buildNamesMap(java.lang.String fileName)
The method reads a file, and stores the value of each line in a HashMap. The values are either first names or surnames.

Parameters:
fileName - The file name that is to be opened and read from.

analyseQuery

private java.lang.String[] analyseQuery(com.yahoo.prelude.Query query,
                                        boolean properName)
The method recognises context terms and proper names.

Parameters:
query - The raw query to be traversed.
properName - Whether proper name recognition should be utilised.
Returns:
The desired search context represented by the Dewey number of its root node.

doXmlSearch

private com.yahoo.prelude.Result doXmlSearch(com.yahoo.prelude.Query query,
                                             java.lang.String contextLabel,
                                             int properNameFound,
                                             int offset,
                                             int hits,
                                             int k,
                                             AggregationFunction aggrFunc)
Traverses the XML structure from the root context node found in analyseQuery(), performs a search by calling doXSearch() for each leaf node, and merges the subresults by calling mergeResults().

Parameters:
query - The raw query.
contextLabel - The Dewey number of the desired context node.
properNameFound - Whether all query keywords (except the context term) are proper names.
offset - The first hit number to retrieve.
hits - The number of hits to retrieve in each subresult.
k - The number of hits to return in the merged result.
aggrFunc - Which aggregation function that should be used.
Returns:
The final result.

doXSearch

private com.yahoo.prelude.Result doXSearch(java.lang.String contextLabel,
                                           int properNameFound,
                                           int offset,
                                           int hits,
                                           java.util.ArrayList<com.yahoo.prelude.Result> subResults,
                                           com.yahoo.prelude.Query query,
                                           java.lang.String currentLabel)
This method is a workaround, caused by inadequate labelling of parts of the document corpus. The Shopping vertical uses six types of document schemes. Three of these; book, music, and video, are labelled in an appropriate way. For example, the music document type contains separate fields for artist and song. The rest of the document types; paidmerchant, freemerchant, and fastupdate, have an inappropriate labelling scheme, only providing generic fields as txt and int. Book, music, and video only constitute about 8% of the total number of documents. In order to have the possibility to use the entire document base as a test collection, we implemented a workaround that changed a search from actor:keywords to desc:keywords AND desc:actor for these document types. This was also done for the fields author, actor, and director. Additionally, a department filter was added to focus the search towards the desired category. In the case of the keywords being proper names, the keywords were rewritten as a phrase.

Parameters:
contextLabel - The Dewey number of the desired context node.
properNameFound - Whether all query keywords (except the context term) are proper names.
offset - The first hit number to retrieve.
hits - The number of hits to retrieve in each subresult.
query - The raw query.
currentLabel - The Dewey number of the current node to be searched in.
Returns:
The subresult.

doSuperSearch

private com.yahoo.prelude.Result doSuperSearch(com.yahoo.prelude.Query query,
                                               int offset,
                                               int hits,
                                               java.lang.String currentLabel,
                                               java.lang.String contextLabel,
                                               boolean removeContextWord)
Modifies the query by calling modifyQuery(), and sends it to the next Searcher in the chain.

Parameters:
query - The hacked query.
offset - The first hit number to retrieve.
hits - The number of hits to retrieve in each subresult.
currentLabel - The Dewey number of the current node to be searched in.
contextLabel - The Dewey number of the desired context node.
removeContextWord - Whether the context term should be removed from the query.
Returns:
The retrieved subresult.

modifyQuery

private com.yahoo.prelude.query.Item modifyQuery(com.yahoo.prelude.query.Item item,
                                                 java.lang.String currentLabel,
                                                 java.lang.String contextLabel,
                                                 boolean removeContextWord)
The method traverses the query tree, and performs the following modifications: - Removal of a keyword recognised as a context term. - Addition of the context term as index labels, e.g. music:madonna. - Addition of a department filter, e.g. department=books. This is a part of the workaround performed in doXSearch().

Parameters:
item - The hacked query root.
currentLabel - The Dewey number of the current node to be searched in.
contextLabel - The Dewey number of the desired context node.
removeContextWord - Whether the context term should be removed from the query.
Returns:
The modified query root.

mergeResults

private com.yahoo.prelude.Result mergeResults(com.yahoo.prelude.Query query,
                                              java.util.ArrayList<com.yahoo.prelude.Result> subResults,
                                              int k,
                                              AggregationFunction aggrFunc)
Merges the subresults by performing the No Random Access algorithm. The top-k hits are sorted according to their lower bound values. NraHit objects are kept in a hash map, sorted according to NraHitOrderer. The method uses the AggregationFunction as a common interface to the specific aggregation functions.

Parameters:
query - The original query.
subResults - The subresults to be merged.
k - The number of hits to return in the merged result.
aggrFunc - Which aggregation function that should be used.
Returns:
The merged result.