Triad Enumeration at Trillion-Scale Using a Single Commodity Machine
Original version
10.5441/002/edbt.2019.95Abstract
Triad enumeration yields more detailed information than triangle enumeration. However, triad enumeration is more complex as it has to list the edges as well as the nodes of the triads. Furthermore, it is challenging to do on large graphs because of two reasons: how to deal with large amounts of data using limited memory, and how to do the computation in a reasonable amount of time. While distributed computing can take care of both problems, it requires large investment and high operating cost, as well as a distributed algorithm design which is not always possible. In this paper we show that triad enumeration of very large graphs at the web-scale can actually be done on a single commodity machine. Memory space limitation can be overcome by using data compression and partial loading. Performance can be greatly improved through optimized preprocessing and parallelization.