Distributed NetFlow Processing Using the Map-Reduce Model
Abstract
We study the viability of using the map-reduce model and frameworks for NetFlow data processing. The map-reduce model is an approach to distributed processing that simplifies implementation work, and it can also help in adding fault tolerance to large processing jobs. We design and implement two prototypes of a NetFlow processing tool. One prototype is based on a design where we freely choose an approach that we consider optimal with regard to performance. This prototype functions as a reference design. The other prototype is based on and makes use of the supporting features of a map-reduce framework. The performance of both prototypes is benchmarked, and we evaluate the performance of the framework based prototype against the reference design. Based on the benchmarks we analyse and comment the differences in performance, and make a conclusion about the suitability of the map-reduce model and frameworks for the problem at hand.