Structured data extraction: separating content from noise on news websites

Arizaleta, Mikel

Arizaleta, Mikel

Master thesis

View/Open

348825_COVER01.pdf (46.46Kb)

348825_FULLTEXT01.pdf (1.166Mb)

URI

http://hdl.handle.net/11250/251379

Date

2009

Metadata

Show full item record

Collections

Institutt for datateknologi og informatikk [6547]

Abstract

In this thesis, we have treated the problem of separating content from noise on news websites. We have approached this problem by using TiMBL, a memory-based learning software. We have studied the relevance of the similarity in the training data and the effect of data size in the performance of the extractions.

Publisher

Institutt for datateknikk og informasjonsvitenskap