Political Categorization of Norwegian Text

Mina Takle Stensaker

dc.contributor.advisor	Gul
dc.contributor.author	Mina Takle Stensaker
dc.date.accessioned	2022-11-04T18:19:37Z
dc.date.available	2022-11-04T18:19:37Z
dc.date.issued	2022
dc.identifier	no.ntnu:inspera:112046434:34589134
dc.identifier.uri	https://hdl.handle.net/11250/3030252
dc.description	Full text not available
dc.description.abstract
dc.description.abstract	The process of extracting essential data from standard text has experienced ex- ponential growth and adoption over the last few years, and it is not expected to be stalled according to Yarchi et al. [2021]. The field of text mining is now a large branch of knowledge, containing multiple applicable areas. This thesis provides an overview of the methods used in text mining, where some are applied in the attempt of identifying viewpoints from Norwegian text. Some of the more common techniques used in text mining, are often not applicable to political text mining, as you need to be able to detect irony, sarcasm, words used differently in different political parties, etc. In other words, the machine has to take a larger part of text in consideration to properly analyze if there exists a viewpoint. Since there has not been much research in the area of political text mining, especially in Norway, another challenge is that the data sets available might not be annotated for political viewpoints. This makes the progress quite flat in the beginning, as the data annotation takes a lot of time and effort. As this project is focusing on Norwegian political viewpoints, we did not get access to any data already annotated, thus, a lot of time was spent on the annotation process. In the experiment, supervised learning is used to investigate how well Bag- of-Words, Term Frequency-Inverse Document Frequency, and three models of Sentence Embeddings represents political texts, by applying and Naive Bayes, Logistic Regression and Random Forest as classifiers. The classifiers is also eval- uated. The accuracy scores has a long way to go, to be able to compete with scores of other classification solutions, but are better than expected. The best obtained accuracy was about 78%, have in mind that this is a point estimate.
dc.language	eng
dc.publisher	NTNU
dc.title	Political Categorization of Norwegian Text
dc.type	Master thesis

Tilhørende fil(er)

Filer	Størrelse	Format	Vis

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6772]

Vis enkel innførsel