Abstractive microblogs summarization
Master thesis
View/ Open
Date
2015Metadata
Show full item recordCollections
- Institutt for design [1185]
Abstract
Microblogging is a new electronic communication medium based on short status updates
containing personal and instant information. Due to the popularity of microblogs, the
volume of information is enormous and big portion of it is duplicative or irrelevant. The
effective way to summarize information can be used by scientists, journalists and marketing
analysts to get cleverer insights about people’s reactions and opinions on different
topics: political debates, sport events or product presentations.
Existing summarization algorithms can be enhanced in several ways. The first way
is to add sentiment analysis. As information in microblogs is very opinionated, analyzing
tweets polarity can improve machine summaries by selecting more sentiment tweets
than pure topical. Another enhancement is to use different summary length for different
topics. Previous studies often limit summaries to be particular length. Relaxing this
restriction can present summaries that are more optimal for a particular topic.
The goal of this research is to perform qualitative study of these enhancements and
to provide insights and suggestions for conducting bigger qualitative research. In total
ten topics are selected, for which human summaries are compared to state-of-the-art
non-sentiment and sentiment summarizers.
Resulting observations are the following: there is more topical than sentiment content
in summaries generated by humans, however individual biases could be against the
trend; the length of the summary is an important feature that influences both generation
of human summaries and interpretation of evaluation results, different topics require
summaries of different length; sentiment summarization doesn’t produce better results
for any evaluation metric used, but there could be possibility for its application in proper
settings with specific topics.