Structured data extraction: separating content from noise on news websites
Master thesis
Permanent lenke
http://hdl.handle.net/11250/251379Utgivelsesdato
2009Metadata
Vis full innførselSamlinger
Sammendrag
In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.