jcreek.cke.util
Class TextUtilities

java.lang.Object
  extended byjcreek.cke.util.TextUtilities

public class TextUtilities
extends java.lang.Object

Contains some basic text utilities that could be very handy in certain situations.

Author:
Stein L. Tomassen, NTNU 2002

Constructor Summary
TextUtilities()
           
 
Method Summary
static int findSubStringEndPos(java.lang.StringBuffer rawText, int startPos, java.lang.String subString, boolean ignoreCase)
          Similar to findSubStringFromToPos, but returns only the ending position.
static int[] findSubStringFromToPos(java.lang.StringBuffer rawText, int startPos, java.lang.String startPattern, java.lang.String endPattern, boolean ignoreCase, boolean includePatterns)
          This method tries to find the start and end of a substring, by specifying the content of the start and end of the substring.
static int findSubStringStartPos(java.lang.StringBuffer rawText, int startPos, java.lang.String subString, boolean ignoreCase)
          Similar to findSubStringFromToPos, but returns only the starting position.
protected static boolean isIgnorableChar(char chr)
          Checks if the specified character is an ignorable character.
static java.lang.String replaceMatch(java.lang.String pattern, java.lang.String txt, java.lang.String substitute, boolean caseDependent)
          Replaces a matching regular expression pattern with a substitute text
static java.lang.String stripHTML(java.lang.String html)
          Strips a HTML file for it's HTML tags by using regular expressions.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextUtilities

public TextUtilities()
Method Detail

replaceMatch

public static java.lang.String replaceMatch(java.lang.String pattern,
                                            java.lang.String txt,
                                            java.lang.String substitute,
                                            boolean caseDependent)
                                     throws java.lang.Exception
Replaces a matching regular expression pattern with a substitute text

Parameters:
pattern - the regular expression pattern to match
txt - the text that should be searched and replaced within
substitute - the text that should replace the matching text found
caseDependent - set to true if the matching against the specified pattern should be case dependent, or set to false if it should be case independent
Returns:
the new text with the replacements where match where found
Throws:
java.lang.Exception - if a pattern failure occurs

stripHTML

public static java.lang.String stripHTML(java.lang.String html)
                                  throws java.lang.Exception
Strips a HTML file for it's HTML tags by using regular expressions. Does also convert some of the most common special characters into readable characters.

Parameters:
html - the HTML document to strip for HTML tags
Returns:
the document stripped for tags
Throws:
java.lang.Exception - if an pattern failure occurs

findSubStringFromToPos

public static int[] findSubStringFromToPos(java.lang.StringBuffer rawText,
                                           int startPos,
                                           java.lang.String startPattern,
                                           java.lang.String endPattern,
                                           boolean ignoreCase,
                                           boolean includePatterns)
This method tries to find the start and end of a substring, by specifying the content of the start and end of the substring. The substring is specified by giving as parameters some of the text at the start and the end of the substring. For example, lets say you would search for the text between two HTML tags, e.g. '<UL>' and '</UL>', if the pattern where found the the starting and ending position of the pattern are returned.

Note, that this algorithm is a very simple one, and it should perhaps be using regular expression for the string to match ot make it more robust.

Parameters:
rawText - the text to performe the serach within
startPos - the starting position for where in the rawText the search should begin
startPattern - a string containing the text to match which should be the start of the substring
endPattern - a string containing the text to match which should be the end of the substring
ignoreCase - set to true if the matching against the specified pattern should be case dependent, or set to false if it should be case independent
includePatterns - set to true if the patterns should be included in the result.
Returns:
an array with two values, that is the starting and ending positions respectively, if they where found. If no match where found, null will be returned.

findSubStringEndPos

public static int findSubStringEndPos(java.lang.StringBuffer rawText,
                                      int startPos,
                                      java.lang.String subString,
                                      boolean ignoreCase)
Similar to findSubStringFromToPos, but returns only the ending position.

Parameters:
rawText - the text to performe the serach within
startPos - the starting position for where in the rawText the search should begin
subString - a string containing the text to match
ignoreCase - set to true if the matching against the specified pattern should be case dependent, or set to false if it should be case independent
Returns:
the ending position of the substring found, does otherwise return -1 if no substring where found.

findSubStringStartPos

public static int findSubStringStartPos(java.lang.StringBuffer rawText,
                                        int startPos,
                                        java.lang.String subString,
                                        boolean ignoreCase)
Similar to findSubStringFromToPos, but returns only the starting position.

Parameters:
rawText - the text to performe the serach within
startPos - the starting position for where in the rawText the search should begin
subString - a string containing the text to match
ignoreCase - set to true if the matching against the specified pattern should be case dependent, or set to false if it should be case independent
Returns:
the starting position of the substring found, does otherwise return -1 if no substring where found.

isIgnorableChar

protected static boolean isIgnorableChar(char chr)
Checks if the specified character is an ignorable character. An ignorable character is either a newLine, tab, form feed, return or a space.

Parameters:
chr - the character to check
Returns:
true if the specified character is an ignorable character, otherwise false is returned


Created at IDI, NTNU by the Artificial Intelligence and Learning group