Experimentation with inverted indexes for dynamic document collections
MetadataVis full innførsel
This report aims to asses the efficiency of various inverted indexes when the indexed document collection is dynamic. To achieve this goal, we experiment with three different overall structures: Remerge, hierarchical indexes and a naive B-tree index. An efficiency model is also developed. The resulting estimates for each structure from the efficiency model are compared to the actual results. We introduce two modifications to existing methods. The first is a new scheme for accumulating an index in memory during sort-based inversion. Even though the memory characteristics of this modified scheme are attractive, our experiments suggest that other proposed methods are more efficient. We also introduce a modification to the hierarchical indexes, which makes them more flexible. Tf-idf is used as the ranking scheme in all tested methods. Approximations to this scheme are suggested to make it more efficient in an updatable index. We conclude that in our implementation, the hierarchical index with the modification we have suggested performs best overall. We also conclude that the tf-idf ranking scheme is not fit for updatable indexes. The major problem with using the scheme is that it becomes difficult to make documents searchable immediately without sacrificing update speed.