Throughput Computing on Future GPUs
MetadataShow full item record
The general-purpose computing capabilities of the Graphics Processing Unit (GPU) have recently been given a great deal of attention by the High-Performance Computing (HPC) community. By allowing massively parallel applications to run efficiently on commodity graphics cards, personal supercomputers are now available in desktop versions at a low price. For some applications, speedups of 70 times that of a single CPU implementation have been achieved. Among the most popular GPUs are those based on the NVIDIA Tesla Architecture which allows relatively easy development of GPU applications using the NVIDIA CUDA programming environment. While the GPU is gaining interest in the HPC community, others are more reluctant to embrace the GPU as a computational device. The focus on throughput and large data volumes separates Information Retrieval (IR) from HPC, since for IR it is critical to process large amounts of data efficiently, a task which the GPU currently does not excel at. Only recently has the IR community begun to explore the possibilities, and an implementation of a search engine for the GPU was published recently in April 2009. This thesis analyzes how GPUs can be improved to better suit large data volume applications. Current graphics cards have a bottleneck regarding the transfer of data between the host and the GPU. One approach to resolve this bottleneck is to include the host memory as part of the GPUs memory hierarchy. We develop a theoretical model, and based on this model, the expected performance improvement for high data volume applications are shown for both computationally-bound and data transfer-bound applications. The performance improvement for an existing search engine is also given based on the theoretical model. For this case, the improvements would result in a speedup between 1.389 and 1.874 for the various query-types supported by the search engine.