Automated tuning of MapReduce performance in Vespa Document Store
MetadataShow full item record
MapReduce is a programming model for distributed processing, originally designed by Google Inc. It is designed to simplify the implementation and deployment of distributed programs. Vespa Document Store (VDS) is a distributed document storage solution developed by Yahoo! Technologies Norway. VDS does not currently have any feature allowing distributed aggregation of data. Therefore, a prototype of the MapReduce distributed programming model was previously developed. However, the implementation requires manual tuning of several parameters before each deployment. The goal of this thesis is to allow as many as possible of these parameters to be either automatically configured or set to universally suitable defaults. We have created a working MapReduce implementation based on previous work, and a framework for monitoring of VDS nodes. Various VDS features have been documented in detail, this documentation has been used to analyse how the performance of these features may be improved. We have also performed various experiments to validate the analysis and gain additional insight. Numerous configuration options for either VDS in general or the MapReduce implementation have been considered, and recommended settings have been proposed. The propositions are either in the form of default values or algorithms for computing the most suitable setting. Finally, we provide a list of suggested further work, with suggestions for both general VDS improvements and MapReduce-specific research.