Bottom Up and Top Down – Twig Pattern Matching on Indexed Trees
MetadataVis full innførsel
This PhD thesis is a collection of papers presented with a general introduction to the topic, which is twig pattern matching (TPM) on indexed tree data. TPM is a pattern matching problem where occurrences of a query tree are found in a usually much larger data tree. This has applications in XML search, where the data is tree shaped and the queries specify tree patterns. The papers included present contributions on how to construct and use structure indexes, which can speed up pattern matching, and on how to efficiently join together results for the different parts of the query with so-called twig joins.Paper 1  shows how to perform more efficient matching of root-to-leaf query paths in so-called path indexes, by using new opportunistic algorithms on existing data structures.Paper 2  proves a tight bound on the worst-case space usage for a data structure used to implement path indexes.Paper 3  presents an XML indexing system which combines existing techniques in a novel way, and has orders of magnitude improved performance over existing commercial and open-source systems.Paper 4  reviews and creates a taxonomy for the many advances in the field of TPM on indexed data, and proposes opportunities for further research.Paper 5  bridges the gap between worst-case optimality and practical performance in current twig join algorithms.Paper 6  improves the construction cost of so-called forward and backward path indexes for tree data from loglinear to linear.