Weighted Pattern Matching with PWMs on FPGAs
MetadataVis full innførsel
This paper has presented a solution to an FPGA-based PWM matcher in the form of the so-called FPWM Prototype, using the hardware facilities on the Cray XD1 Supercomputer. The prototype implementation currently runs as a single core on a single node of the Cray, and provides a theoretical PWM matching capability roughly 15 times greater than a contemporary Pentium M general-purpose CPU. Theoretical and empirical data regarding performance and resource consumption for this implementation have been provided. A method for increasing the speedup to a theoretical maximum of 480x has also been described, using a multi-core implementation on a single chip. This theoretical limit could potentially be attained with today's hardware, but would require certain compromises with regard to bit resolution and PWM length in order to fit on the FPGA. A full-scale implementation providing the capabilities required by many of today's algorithms would most likely not reach this speed, but as the FPGA currently installed on the Cray is also available in a larger variant (the Virtex-4 family), it is reasonable to assume that such an implementation could indeed be feasible on contemporary hardware. A method for using several nodes on the Cray XD1 transparently for the user application, in order to further increase the performance, has also been described. However, as theoretical performance estimation on such hardware is a highly inexact science, and empirical measurements could not be performed at this time due to the state of the prototype, no estimates have been provided for this method. While some of the original goals were attained, other parts of the project could be considered a failure. Due to a number of implementation problems, a working FPWM was not available in time for use with the two other projects mentioned in the introduction, involving hardware acceleration of the Gibbs Sampling and MEME algorithms. The main problem with the cooperation between these projects was that it relied on the FPWM being in a finished and working condition before the work involving it could begin, which turned out to be much harder and take much longer time than what was first envisioned. The planned empirical measurements of the performance boost for these algorithms are therefore not yet available.