|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcontrol.Operation
datapreparation.TextOperation
datapreparation.collocation.Collocation
public class Collocation
Class representing the collocation extraction operation. This operation tries to find collocations of two nouns. In addition, proper nouns and proper noun groups are extracted. The user may also specify the extraction of verbs, adjectives, adverbs and numbers.
Constructor Summary | |
---|---|
Collocation()
Constructor for the collocation operation. |
Method Summary | |
---|---|
boolean |
calculateCollocation(java.lang.String firstWord,
java.lang.String secondWord)
Method for performing the actual calculation too see if two words form a collocation. |
void |
extractTermsFromSentence(java.lang.String sentence,
Text newText)
Method for extracting terms from a sentence. |
Text |
extractTermsFromText(Text text)
Method for extracting terms from a specific text |
java.lang.String |
findNounLemma(java.lang.String word)
Method for looking up the lemma of a noun in WordNet. |
java.util.ArrayList |
findOverlap(java.util.ArrayList one,
java.util.ArrayList two)
Method for finding which documents the two terms appear together in |
java.util.ArrayList |
findSentences(java.lang.String text)
Method for finding the sentences in a text. |
boolean |
firstCharIsDivider(java.lang.String word)
Method for checking to see if the first character of a word is a divider. |
java.lang.String |
generateCollocation(java.lang.String firstWord,
java.lang.String secondWord)
Method for generating a collocation of two words. |
java.util.ArrayList |
getProperties()
Method for getting the properties |
void |
performOperation(DataSet dataSet)
Method for performing the operation |
java.lang.String |
removeChars(java.lang.String word)
Method for removing specific characters from a word. |
void |
setProperties(java.util.ArrayList properties)
Method for setting the properties |
boolean |
wordContainsChar(java.lang.String word)
Method for checking if a word contains on of a list of dividers |
boolean |
wordEndsWithComma(java.lang.String word)
Method for checking if the last character in a word is a comma. |
boolean |
wordIsStopword(java.lang.String word)
Method for checking whether a word is a stopword. |
boolean |
wordIsValid(java.lang.String word)
Method for checking whether a word is valid. |
Methods inherited from class control.Operation |
---|
getLogResult, setLogResult |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public Collocation()
Method Detail |
---|
public void performOperation(DataSet dataSet)
performOperation
in class Operation
dataSet
- The dataset being usedpublic Text extractTermsFromText(Text text)
text
- The text being processed
public java.lang.String findNounLemma(java.lang.String word)
word
- The word whose lemma is wanted
public void extractTermsFromSentence(java.lang.String sentence, Text newText)
sentence
- The sentence being processednewText
- The text the sentence belongs topublic java.lang.String generateCollocation(java.lang.String firstWord, java.lang.String secondWord)
firstWord
- The first word in the collocationsecondWord
- The second word in the collocation
public boolean wordIsValid(java.lang.String word)
word
- The word being checked
public boolean wordIsStopword(java.lang.String word)
word
- The word being checked.
public boolean calculateCollocation(java.lang.String firstWord, java.lang.String secondWord)
firstWord
- The first wordsecondWord
- The second word
public java.util.ArrayList findOverlap(java.util.ArrayList one, java.util.ArrayList two)
one
- The documents of word onetwo
- The documents of word two
public java.util.ArrayList findSentences(java.lang.String text)
text
- The text
public boolean wordContainsChar(java.lang.String word)
word
- The word being checked
public boolean firstCharIsDivider(java.lang.String word)
word
- The word being checked
public boolean wordEndsWithComma(java.lang.String word)
word
- The word being checked
public java.lang.String removeChars(java.lang.String word)
word
- The word
public java.util.ArrayList getProperties()
getProperties
in class Operation
public void setProperties(java.util.ArrayList properties)
setProperties
in class Operation
properties
-
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |