Column Stores versus Search Engines and Applications to Search in Social Networks
MetadataShow full item record
Search engines and database systems both play important roles as we store and organize ever increasing amounts of information and still require the information to be easily accessible. Research on these two types of systems has traditionally been partitioned into two fields, information retrieval and databases, and the integration of these two fields has been a popular research topic. Rather than attempting to integrate the two fields, this thesis begins with a comparison of the technical similarities between search engines and a specific type of database system often used in decision support systems: column stores. Based on an initial assessment of the technical similarities, which includes an evaluation of the feasibility of creating a hybrid system that supports both workloads, the papers in this thesis investigate how the identi_ed similarities can be used as a basis for improving the effciency of the different systems. To improve the efficiency of processing decision support workloads, the use of inverted indexes as an alternative to bitmap indexes is evaluated. We develop a query processing framework for compressed inverted indexes in decision support workloads and find that it outperforms state-of-the-art compressed bitmap indexes by being significantly more compact, and also improves the query processing e_ciency for most queries. Keyword search in social networks with access control is also addressed in this thesis, and a space of solutions is developed along two axes. One of the axes defines the set of inverted indexes that are used in the solution, and the other defines the meta-data used to filter out inaccessible results. With a exible and efficient search system based on a column-oriented storage system, we conduct a thorough set of experiments that illuminate the trade-offs between different extremes in the solution space. We also develop a hybrid scheme in between two of the best extremes. The hybrid approach uses cost models to find the most efficient solution for a particular workload. Together with an effcient query processing framework based on our novel HeapUnion operator, this results in a system that is e_cient for a wide range of workloads that consist of updates and searches with access control in a social network.
Has partsBjørklund, Truls A.; Gehrke, Johannes; Torbjørnsen, Øystein. A Confluence of Column Stores and Search Engines: Opportunities and Challenges. Using Search Engine Technology for Information Management (USETIM) 2009, 2009.
Bjørklund, Truls A.; Grimsmo, Nils; Gehrke, Johannes; Øystein, Torbjørnsen. Inverted Indexes vs. Bitmap Indexes in Decision Support Systems. Proceedings of 18th ACM Conference on Information and Knowledge Management (CIKM'09): 1509-1512, 2009. 10.1145/1645953.1646158.
Bjørklund, Truls Amundsen; Götz, Michaela; Gehrke, Johannes. Search in social networks with access control. KEYS '10 Proceedings of the 2nd International Workshop on Keyword Search on Structured Data, 2010. 10.1145/1868366.1868370.
Bjørklund, Truls A.; Götz,, Michaela; Gehrke, Johannes; Grimsmo, Nils. Workload-Aware Indexing for Keyword Search in Social Networks. , 2011.
Grimsmo, Nils; Bjørklund, Truls Amundsen; Hetland, Magnus Lie. Fast Optimal Twig Joins. Proceedings of the VLDB Endowment, 2010.
Grimsmo, Nils; Bjørklund, Truls Amundsen; Hetland, Magnus Lie. Linear Computation of the Maximum Simultaneous Forward and Backward Bisimulation for Node-Labeled Trees. proceedings of the 7th International XML Database Symposium, 2010.