Efficient Query Processing in Distributed Search Engines

Jonassen, Simon

dc.contributor.author	Jonassen, Simon	nb_NO
dc.date.accessioned	2014-12-19T13:39:27Z
dc.date.available	2014-12-19T13:39:27Z
dc.date.created	2013-02-19	nb_NO
dc.date.issued	2013	nb_NO
dc.identifier	606512	nb_NO
dc.identifier.isbn	978-82-471-4134-2 (printed ver.)	nb_NO
dc.identifier.isbn	978-82-471-4135-9 (electronic ver.)	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/253101
dc.description.abstract	Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, which determine the revenue of the search engine. In this context, any improvement in query processing efficiency can reduce the operational costs and improve user satisfaction, hence improve the overall benefit. In this thesis, we elaborate on query processing efficiency, address several problems within partitioned query processing, pruning and caching and propose several novel techniques: First, we look at term-wise partitioned indexes and address the main limitations of the state-of-the-art query processing methods. Our first approach combines the advantage of pipelined and traditional (non-pipelined) query processing. This approach assumes one disk access per posting list and traditional term-at-a-time processing. For the second approach, we follow an alternative direction and look at document-at-a-time processing of sub-queries and skipping. Subsequently, we present several skipping extensions to pipelined query processing, which as we show can improve the query processing performance and/or the quality of results. Then, we extend one of these methods with intra-query parallelism, which as we show can improve the performance at low query loads. Second, we look at skipping and pruning optimizations designed for a monolithic index. We present an efficient self-skipping inverted index designed for modern index compression methods and several query processing optimizations. We show that these optimizations can provide a significant speed-up compared to a full (non-pruned) evaluation and reduce the performance gap between disjunctive (OR) and conjunctive (AND) queries. We also propose a linear programming optimization that can further improve the I/O, decompression and computation efficiency of Max-Score. Third, we elaborate on caching in Web search engines in two independent contributions. First, we present an analytical model that finds the optimal split in a static memory-based two-level cache. Second, we present several strategies for selecting, ordering and scheduling prefetch queries and demonstrate that these can improve the efficiency and effectiveness of Web search engines. We carefully evaluate our ideas either using a real implementation or by simulation using real-world text collections and query logs. Most of the proposed techniques are found to improve the state-of-the-art in the conducted empirical studies. However, the implications and applicability of these techniques in practice need further evaluation in real-life settings.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Norges teknisk-naturvitenskapelige universitet	nb_NO
dc.relation.ispartofseries	Doktoravhandlinger ved NTNU, 1503-8181; 2013:22	nb_NO
dc.relation.haspart	Jonassen, Simon; Bratsberg, Svein Erik. A Combined Semi-pipelined Query Processing Architecture for Distributed Full-Text Retrieval. WEB INFORMATION SYSTEM ENGINEERING-WISE 2010: 587-601, 2010. <a href='http://dx.doi.org/10.1007/978-3-642-17616-6_51'>10.1007/978-3-642-17616-6_51</a>.	nb_NO
dc.relation.haspart	Jonassen, Simon; Bratsberg, Svein Erik. Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries. ADVANCES IN INFORMATION RETRIEVAL: 530-542, 2011. <a href='http://dx.doi.org/10.1007/978-3-642-20161-5_53'>10.1007/978-3-642-20161-5_53</a>.	nb_NO
dc.relation.haspart	Jonassen, Simon; Bratsberg, Svein Erik. Improving the Performance of Pipelined Query Processing with Skipping. 13th International Conference, Paphos, Cyprus, November 28-30, 2012. Proceedings: 1-15, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-35063-4_1'>10.1007/978-3-642-35063-4_1</a>.	nb_NO
dc.relation.haspart	Jonassen, Simon; Bratsberg, Svein Erik. Intra-Query Concurrent Pipelined Processing for Distributed Full-Text Retrieval. Advances in Information Retrieval: 34th European Conference on IR Research, ECIR 2012 Barcelona, Spain, April 1-5, 2012 Proceedings: 413-425, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-28997-2_35'>10.1007/978-3-642-28997-2_35</a>.	nb_NO
dc.relation.haspart	Jonassen, Simon; Cambazoglu, B. Barla. Improving Dynamic Index Pruning via Linear Programming. .	nb_NO
dc.relation.haspart	Baeza-Yates, Ricardo; Jonassen, Simon. Modeling Static Caching in Web Search Engines. Lecture Notes in Computer Science. (ISSN 0302-9743). 7224: 436-446, 2012. <a href='http://dx.doi.org/10.1007/978-3-642-28997-2_37'>10.1007/978-3-642-28997-2_37</a>.	nb_NO
dc.relation.haspart	Jonassen, Simon; Cambazoglu, B. Barla; Silvestri, Fabrizio. Prefetching query results and its impact on search engines. The 35th International ACM SIGIR conference on research and development in Information Retrieval, SIGIR '12, Portland, OR, USA, August 12-16, 2012, Proceedings: 631-640, 2012. <a href='http://dx.doi.org/10.1145/2348283.2348368'>10.1145/2348283.2348368</a>.	nb_NO
dc.title	Efficient Query Processing in Distributed Search Engines	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.description.degree	PhD i informasjonsteknologi	nb_NO
dc.description.degree	PhD in Information Technology	en_GB

Tilhørende fil(er)

Filnavn:: 602215_FULLTEXT01.pdf
Størrelse:: 288.0Kb
Format:: PDF

Åpne

Filnavn:: 536558_FULLTEXT01.pdf
Størrelse:: 452.6Kb
Format:: PDF

Åpne

Filnavn:: 606512_FULLTEXT01.pdf
Størrelse:: 2.764Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6829]

Vis enkel innførsel