datapreparation.collocation
Class Collection

java.lang.Object
  extended by datapreparation.collocation.Collection

public class Collection
extends java.lang.Object

Class implementing a term collection. A term collection consists of a list of terms with total frequence (sum of frequence in all texts) and the number of text the term is present in. Each term also has a list of which texts it is present in.

Author:
Ole Kristian Fivelstad

Constructor Summary
Collection()
          Constructor creating a new hashtable for storing terms.
 
Method Summary
 void fillCollection(DataSet set)
          Method for filling the collection.
 java.util.ArrayList getDocuments(java.lang.String term)
          Method for getting which documents a term appears in.
 int getFrequency(java.lang.String term)
          Method for getting the total frequency of a given term.
 int getNofTexts(java.lang.String term)
          Method for getting the number of texts the given term is present in.
 int getTotalNumberOfTokens()
          Method for getting the total number of tokens in the dataset.
 int possibleCollocations(java.lang.String text)
          Method for calculating the number of possible collocations in a text
 java.lang.String removeChars(java.lang.String word)
          Method for removing specific characters from a word.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Collection

public Collection()
Constructor creating a new hashtable for storing terms.

Method Detail

fillCollection

public void fillCollection(DataSet set)
Method for filling the collection. A dataSet is given and parsed so that all individual terms are stored in the collection.

Parameters:
set - The dataset

getTotalNumberOfTokens

public int getTotalNumberOfTokens()
Method for getting the total number of tokens in the dataset.

Returns:
The total number

removeChars

public java.lang.String removeChars(java.lang.String word)
Method for removing specific characters from a word.

Parameters:
word - The word
Returns:
The cleaned word

getFrequency

public int getFrequency(java.lang.String term)
Method for getting the total frequency of a given term.

Parameters:
term - The term
Returns:
The frequency

getNofTexts

public int getNofTexts(java.lang.String term)
Method for getting the number of texts the given term is present in.

Parameters:
term - The term
Returns:
The number of texts

getDocuments

public java.util.ArrayList getDocuments(java.lang.String term)
Method for getting which documents a term appears in.

Parameters:
term - The term
Returns:
The documents it appears in.

possibleCollocations

public int possibleCollocations(java.lang.String text)
Method for calculating the number of possible collocations in a text

Parameters:
text - The text
Returns:
Number of possible collocations