Scaling Internet Search Engines - Methods and Analysis

Risvik, Knut Magne

dc.contributor.author	Risvik, Knut Magne	nb_NO
dc.date.accessioned	2014-12-19T13:30:06Z
dc.date.available	2014-12-19T13:30:06Z
dc.date.created	2007-09-28	nb_NO
dc.date.issued	2004	nb_NO
dc.identifier	122755	nb_NO
dc.identifier.isbn	82-471-6317-9	nb_NO
dc.identifier.uri	http://hdl.handle.net/11250/249870
dc.description.abstract	This thesis focuses on methods and analysis for building scalable Internet Search Engines. In this work, we have developed a search kernel, an architecture framework and applications that are being used in industrial and commercial products. Furthermore, we present both analysis and design of key elements. Essential to building a large-scale search engine is to understand the dynamics of the content in which we are searching. For the challenging case of searching the web, there are multiple dimensions of dynamics that should ideally be handled. In this thesis we start by examining some of these dimensions and the implications they have on search engine design. When designing a search engine kernel, the focus has been on selection of algorithms and datastructures in the general case. Also, and even more important, we design worst-case characteristics into the search kernel that are very decisive from a scaling standpoint. A performance model to analyze the behaviour of the kernel is also developed. The designed search engine kernel was realized as a predecessor of the current FAST Search kernel (the FMS kernel), and practical experiments and benchmarking demonstrate the correctness of the assumptions from the design of the kernel. Then a framework for scaling shared-nothing systems based upon nodes working on separate portions of the data is introduced. The design of the framework is based on the general principles of replication and distribution. A performance model and an algorithm for cluster design are provided. This is in turn applied to construct a larger-scale web search engine and benchmarking of clusters indicate that the assumptions and models for the distributed architecture hold. The scaling aspect of search engine is further studied in the context of the application itself. Query locality is explored and used to create an architecture that is a generalized type of caching (through partial replication) using the application behaviour and a configurable correctness trade-off to design super-linear scalable search engines. Finally, a discussion of how linguistics are being used in web search engines is provided, focusing on the constraints that apply to ensure the desired scalability.	nb_NO
dc.language	eng	nb_NO
dc.publisher	Fakultet for informasjonsteknologi, matematikk og elektroteknikk	nb_NO
dc.relation.ispartofseries	Doktoravhandlinger ved NTNU, 1503-8181; 2004:54	nb_NO
dc.relation.haspart	Risvik, Knut Magne; Michelsen, Rolf. Search Engines and Web Dynamics. .	nb_NO
dc.relation.haspart	Risvik, Knut Magne; Egge, Tor. The FMS Search Kernel and its Performance Characteristics. .	nb_NO
dc.relation.haspart	Risvik, Knut Magne; Svingen, Børge; Egge, Tor; Halaas, Arne. The FAST Distributed Processing Architecture (DPA) and its Application for a Larger-Scale Search Engine. .	nb_NO
dc.relation.haspart	Risvik, Knut Magne; Aasheim, Yngve; Lidal, Mathias. Multi-tier Architecture for web Search Engines. .	nb_NO
dc.relation.haspart	Gulla, Jon Atle; Auran, Per Gunnar; Risvik, Knut Magne. Linguistics in Large-Scale Web Search. .	nb_NO
dc.title	Scaling Internet Search Engines - Methods and Analysis	nb_NO
dc.type	Doctoral thesis	nb_NO
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi, matematikk og elektroteknikk, Institutt for datateknikk og informasjonsvitenskap	nb_NO
dc.description.degree	Dr.philos.	nb_NO
dc.description.degree	Dr.philos.	en_GB

Tilhørende fil(er)

Filnavn:: 122755_FULLTEXT01.pdf
Størrelse:: 3.166Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6551]

Vis enkel innførsel