Postings List Compression and Decompression on Mobile Devices
MetadataVis full innførsel
Recent years has seen a tremendous increase in both the performance of handheld devices and the use cases they are required to fullfil. Indeed, operations previously reserved for handling on personal computers have begun being executed on smart phones and tablets instead. This revolutionary development allows one to exploit handheld device hardware in novel applications.Trondheim-based start-up Atbrox is engaged in an EU project where Atbrox focus is search on mobile devices. An important component of search is the inverted index, and within, the per term postings list an encoded list of Unified Resource Identifiers (URI). Decoding of a postings list must be fast in order to not comprimise the user experience, but is also required to hold a small storage footprint. As the first to our knowledge, this thesis attempts to identify the properties of postings list encoding and decoding on handheld devices.Variable-byte coding, Group Varint coding, and Elias gamma coding are implemented in Objective-C. Performance is surveyed by benchmarking three devices out of Apple: A 5th generation iPod, a 4th generation iPad, and an iPad Air. Executions are run from disk to-disk, i.e. by reading a block of data, applying either encoding or decoding, and writing the result to permanent storage. Block sizes are varied. In addition, multithreading is applied during both encoding and decoding and compared to serial executions in an attempt to identify the properties under which each coding scheme performs best.This thesis provides valueable insight to the properties of coding schemes on handheld devices. Among its findings is the varying degree of performance and compression ratio between coding schemes: Group Varint proves to outperform the two others in terms of speed, however, is lacking in terms of compression. Elias gamma code provides the best compression ratio, but is the slowest in both encoding and decoding. Results also prove a strong correspondance between block size and performance, although a point of saturation is reached at 512 KiB. Additionally, block sizes below 512 KiB display an inability to take advantage of multithreading.