Vis enkel innførsel

dc.contributor.authorJonassen, Simonnb_NO
dc.date.accessioned2014-12-19T13:39:27Z
dc.date.available2014-12-19T13:39:27Z
dc.date.created2013-02-19nb_NO
dc.date.issued2013nb_NO
dc.identifier606512nb_NO
dc.identifier.isbn978-82-471-4134-2 (printed ver.)nb_NO
dc.identifier.isbn978-82-471-4135-9 (electronic ver.)nb_NO
dc.identifier.urihttp://hdl.handle.net/11250/253101
dc.description.abstractWeb search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, which determine the revenue of the search engine. In this context, any improvement in query processing efficiency can reduce the operational costs and improve user satisfaction, hence improve the overall benefit. In this thesis, we elaborate on query processing efficiency, address several problems within partitioned query processing, pruning and caching and propose several novel techniques: First, we look at term-wise partitioned indexes and address the main limitations of the state-of-the-art query processing methods. Our first approach combines the advantage of pipelined and traditional (non-pipelined) query processing. This approach assumes one disk access per posting list and traditional term-at-a-time processing. For the second approach, we follow an alternative direction and look at document-at-a-time processing of sub-queries and skipping. Subsequently, we present several skipping extensions to pipelined query processing, which as we show can improve the query processing performance and/or the quality of results. Then, we extend one of these methods with intra-query parallelism, which as we show can improve the performance at low query loads. Second, we look at skipping and pruning optimizations designed for a monolithic index. We present an efficient self-skipping inverted index designed for modern index compression methods and several query processing optimizations. We show that these optimizations can provide a significant speed-up compared to a full (non-pruned) evaluation and reduce the performance gap between disjunctive (OR) and conjunctive (AND) queries. We also propose a linear programming optimization that can further improve the I/O, decompression and computation efficiency of Max-Score. Third, we elaborate on caching in Web search engines in two independent contributions. First, we present an analytical model that finds the optimal split in a static memory-based two-level cache. Second, we present several strategies for selecting, ordering and scheduling prefetch queries and demonstrate that these can improve the efficiency and effectiveness of Web search engines. We carefully evaluate our ideas either using a real implementation or by simulation using real-world text collections and query logs. Most of the proposed techniques are found to improve the state-of-the-art in the conducted empirical studies. However, the implications and applicability of these techniques in practice need further evaluation in real-life settings.nb_NO
dc.languageengnb_NO
dc.publisherNorges teknisk-naturvitenskapelige universitetnb_NO
dc.relation.ispartofseriesDoktoravhandlinger ved NTNU, 1503-8181; 2013:22nb_NO
dc.relation.haspartJonassen, Simon; Bratsberg, Svein Erik. A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. WEB INFORMATION SYSTEM ENGINEERING-WISE 2010: 587-601, 2010. <a href='http://dx.doi.org/10.1007/978-3-642-17616-6_51'>10.1007/978-3-642-17616-6_51</a>.nb_NO
dc.relation.haspartJonassen, Simon; Bratsberg, Svein Erik. Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries. ADVANCES IN INFORMATION RETRIEVAL: 530-542, 2011. <a href='http://dx.doi.org/10.1007/978-3-642-20161-5_53'>10.1007/978-3-642-20161-5_53</a>.nb_NO
dc.relation.haspartJonassen, Simon; Bratsberg, Svein Erik. Improving the Performance of Pipelined Query Processing with Skipping. 13th International Conference, Paphos, Cyprus, November 28-30, 2012. Proceedings: 1-15, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-35063-4_1'>10.1007/978-3-642-35063-4_1</a>.nb_NO
dc.relation.haspartJonassen, Simon; Bratsberg, Svein Erik. Intra-Query Concurrent Pipelined Processing for Distributed Full-Text Retrieval. Advances in Information Retrieval: 34th European Conference on IR Research, ECIR 2012 Barcelona, Spain, April 1-5, 2012 Proceedings: 413-425, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-28997-2_35'>10.1007/978-3-642-28997-2_35</a>.nb_NO
dc.relation.haspartJonassen, Simon; Cambazoglu, B. Barla. Improving Dynamic Index Pruning via Linear Programming. .nb_NO
dc.relation.haspartBaeza-Yates, Ricardo; Jonassen, Simon. Modeling Static Caching in Web Search Engines. Lecture Notes in Computer Science. (ISSN 0302-9743). 7224: 436-446, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-28997-2_37'>10.1007/978-3-642-28997-2_37</a>.nb_NO
dc.relation.haspartJonassen, Simon; Cambazoglu, B. Barla; Silvestri, Fabrizio. Prefetching query results and its impact on search engines. The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12-16, 2012, Proceedings: 631-640, 2012. <a href='http://dx.doi.org/10.1145/2348283.2348368'>10.1145/2348283.2348368</a>.nb_NO
dc.titleEfficient Query Processing in Distributed Search Enginesnb_NO
dc.typeDoctoral thesisnb_NO
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskapnb_NO
dc.description.degreePhD i informasjonsteknologinb_NO
dc.description.degreePhD in Information Technologyen_GB


Tilhørende fil(er)

Thumbnail
Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel