Online News Detection on Twitter

Wold, Henning Moberg; Vikre, Linn Christina

dc.contributor.advisor	Gulla, Jon Atle
dc.contributor.author	Wold, Henning Moberg
dc.contributor.author	Vikre, Linn Christina
dc.date.created	2015-05-28
dc.date.issued	2015
dc.identifier	ntnudaim:10463
dc.identifier.uri	http://hdl.handle.net/11250/2353650
dc.description.abstract	In this thesis, we seek to find suitable methods for detecting news on Twitter within the fields of artificial intelligence, information retrieval, computational linguistics, and natural language processing. We combine these fields to find newsworthy tweets, cluster them based on time and similarity and find the most representative tweet for a news event. We compare different methods within the field of topic modeling to find news topics and tweets related to them in an online setting. Then we have a look at an online clustering approach to be able to detect news while they are being clustered based on time of arrival and similar content. One of the greatest challenges is to find the tweets we can characterize as news in the ocean of tweets. Many tweets are about personal matters or two-way commu- nication among friends. We call these uninteresting tweets chatter . There are many tweets that contain abbreviations, misspellings, and lack of proper sentence structure. This makes it difficult for otherwise good natural language processing systems to evaluate the content and proper language in tweets. Our study shows that finding news using topic modeling is difficult. Training a proper model is time consuming, and even when a working model is obtained, it is unclear how to effectively use it to detect news. While developing the online news detection system for Twitter, we have found that the clustering approach elaborated on in this thesis works well for tweets. The system clusters similar tweets based on time and content, and it performs well doing so. Due to the information entropy and tuning of parameters in the clustering algorithm we were able to achieve a higher precision than the baseline clustering algorithm. Lastly, we have found that the task of finding the most representative tweet for a news event is simple when the tweets have been clustered well, that is when the clusters contain mostly news relevant twets.
dc.language	eng
dc.publisher	NTNU
dc.subject	Informatikk, Software
dc.subject	Informatikk, Databaser og søk
dc.title	Online News Detection on Twitter
dc.type	Master thesis

Files in this item

Name:: 10463_FULLTEXT.pdf
Size:: 1.349Mb
Format:: PDF

View/Open

Name:: 10463_ATTACHMENT.zip
Size:: 21.43Kb
Format:: application/zip

View/Open

Name:: 10463_COVER.pdf
Size:: 234.5Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6822]

Show simple item record