|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectjcolibri.method.retrieve.NNretrieval.similarity.local.textual.compressionbased.CompressionBased
public class CompressionBased
This function returns the similarity of two strings using text compression. Compression Based Dissimilarity (CDM) is defined as follows CDM(x, y) = C(xy) / (C(x) + C(y)) where C(x) is the size of string x after compression (and C(y) similarly) and C(xy) is the size, after compression, of the string that comprises y concatenated to the end of x. See the following paper:
inproceedings{Keogh04, author = {E. Keogh and S. Lonardi and C. Ratanamahatana}, title ={Towards Parameter-Free Data Mining}, booktitle = {Procs.\ of the 10th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining}, year = {2004}, pages = {206--215}, address = {New York, NY, USA}, }Note that CDM is not a distance metric. It is NOT in general the case that CDM(x, y) = 0 iff x = y CDM(x, y) = CDM(y, x) CDM(x, y) + CDM(y, z) >= CDM(x, z) Do not use this measure in any application that requires metric properties. Values of CDM will lies in the range (0.5, 1.0]. We convert to a similarity measure by returning 1.0 - CDM(x, y). This similarity measure is effective in case-based spam filtering using GZip and PPM text compression algorithms. For the details, see:
inproceedings{Delany-Bridge-2006, author = "S. J. Delany and D. Bridge", title = "Feature-Based and Feature-Free Textual {CBR}: A Comparison in Spam Filtering", booktitle = "Procs.\ of the 17th Irish Conference on Artificial Intelligence and Cognitive Science", pages = "244--253", address = "Belfast, Northern Ireland", year = "2006" }Its effectiveness outside of spam filtering (e.g. texts other than emails, short strings, images, etc.) has not been demonstrated. To be effective, the 'right' compression algorithm must be used.
Constructor Summary | |
---|---|
CompressionBased(ICompressor compressor)
|
Method Summary | |
---|---|
double |
compare(java.lang.String x,
java.lang.String y)
Applies the similarity function. |
double |
compute(java.lang.Object o1,
java.lang.Object o2)
Applies the similarity function. |
boolean |
isApplicable(java.lang.Object o1,
java.lang.Object o2)
Indicates if the function is applicable to two objects |
static void |
main(java.lang.String[] args)
|
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public CompressionBased(ICompressor compressor)
compressor
- an object that encapsulates the compression algorithm to be used.Method Detail |
---|
public double compare(java.lang.String x, java.lang.String y)
x
- y
-
public double compute(java.lang.Object o1, java.lang.Object o2) throws NoApplicableSimilarityFunctionException
compute
in interface LocalSimilarityFunction
o1
- Stringo2
- String
NoApplicableSimilarityFunctionException
public boolean isApplicable(java.lang.Object o1, java.lang.Object o2)
LocalSimilarityFunction
isApplicable
in interface LocalSimilarityFunction
public static void main(java.lang.String[] args)
|
GAIA - Group for Artificial Intelligence Applications http://gaia.fdi.ucm.es |
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |