Structured data extraction: separating content from noise on news websites

Arizaleta, Mikel

Arizaleta, Mikel

Master thesis

Åpne

348825_COVER01.pdf (46.46Kb)

348825_FULLTEXT01.pdf (1.166Mb)

Permanent lenke

http://hdl.handle.net/11250/251379

Utgivelsesdato

2009

Metadata

Vis full innførsel

Samlinger

Institutt for datateknologi og informatikk [6620]

Sammendrag

In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.

Utgiver

Institutt for datateknikk og informasjonsvitenskap