dataloader
Class KMP

java.lang.Object
  extended by dataloader.KMP

public class KMP
extends java.lang.Object

This is a straightforward implementation of the famous Knuth-Morris-Pratt algorithm for string-matching. We use this to algorithm to find the indices where each news in the HTML files starts.

Author:
Trond Řivind Eriksen and Kjell-Inge Skogstad

Constructor Summary
KMP()
           
 
Method Summary
 int[] kmp(java.lang.String text, java.lang.String pattern)
          KMP is not a particularily fast algorithm in practice, but it achieves an linear upper bound complexity.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

KMP

public KMP()
Method Detail

kmp

public int[] kmp(java.lang.String text,
                 java.lang.String pattern)
KMP is not a particularily fast algorithm in practice, but it achieves an linear upper bound complexity. The algorithm basically constructs a prefixarray for the pattern, and uses this to test the different possibilities of a match in linear time. It is not obvious that the complexity is linear because we see two nested loops in the algorithm. But once we notice that q can only be incremented by 1 each pass of the mainloop. Since the inner loop can only be run until q is zero and the prefix is a strictly increasing function this implies that the overall complexity is linear according to basic amortized complexity. See Cormen et al: Introduction to algorithms for a formal proof.

Parameters:
text -
pattern -
Returns:
ret