Adaptive Cache Sizing in Search Engines
Abstract
The main objective of this thesis is to design and implement a method to automatically and adaptively partition a fixed amount of memory between a fixed number of main memory caches.Each cache is responsible for a single disk based data structure.The partitioning should reduce the disk usage due to fetching elements from these data structures.The secondary objective is to write a new cache subsystem to easethe work needed to experiment with cache design issues.A continuous model was created based on the problem description.The model expresses the expected total disk cost savingsfor a given assignment of cache sizes. It was then proved that there exists a single assignment of cache sizesthat maximizes the expected total disk cost savings.A gradient based method was then designed to find this optimal solution.A new policy based cache subsystem was designed and implemented,and the partitioning method was implemented on top of this subsystem.Then, a posting list cache, a document information cache, and a dictionary cache was implemented on top of this subsystem.The method along with the subsystem was then testedusing a query trace and documents from the Yahoo! Local search engine.The method was demonstrated to perform slightly better than for the hand tuned cache sizes used in the search engine.