• norsk
    • English
  • English 
    • norsk
    • English
  • Login
View Item 
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
  •   Home
  • Fakultet for informasjonsteknologi og elektroteknikk (IE)
  • Institutt for datateknologi og informatikk
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Extraction-Based Automatic Summarization: Theoretical and Empirical Investigation of Summarization Techniques

Sizov, Gleb
Master thesis
Thumbnail
View/Open
352481_ATTACHMENT01.zip (2.463Mb)
352481_FULLTEXT01.pdf (804.8Kb)
352481_COVER01.pdf (246.9Kb)
URI
http://hdl.handle.net/11250/252155
Date
2010
Metadata
Show full item record
Collections
  • Institutt for datateknologi og informatikk [3796]
Abstract
A summary is a shortened version of a text that contains the main points of the original content. Automatic summarization is the task of generating a summary by a computer. For example, given a collection of news articles for the last week an automatic summarizer is able to create a concise overview of the important events. This summary can be used as the replacement for the original content or help to identify the events that a person is particularly interested in. Potentially, automatic summarization can save a lot of time for people that deal with a large amount of textual information. The straightforward way to generate a summary is to select several sentences from the original text and organize them in way to create a coherent text. This approach is called extraction-based summarization and is the topic of this thesis. Extraction-based summarization is a complex task that consists of several challenging subtasks. The essential part of the extraction-based approach is identification of sentences that contain important information. It can be done using graph-based representations and centrality measures that exploit similarities between sentences to identify the most central sentences. This thesis provide a comprehensive overview of methods used in extraction-based automatic summarization. In addition, several general natural language processing issues such as feature selection and text representation models are discussed with regard to automatic summarization. Part of the thesis is dedicated to graph-based representations and centrality measures used in extraction-based summarization. Theoretical analysis is reinforced with the experiments using the summarization framework implemented for this thesis. The task for the experiments is query-focused multi-document extraction-based summarization, that is, summarization of several documents according to a user query. The experiments investigate several approaches to this task as well as the use of different representation models, similarity and centrality measures. The obtained results indicate that use of graph centrality measures significantly improves the quality of generated summaries. Among the variety of centrality measure the degree-based ones perform better than path-based measures. The best performance is achieved when centralities are combined with redundancy removal techniques that prevent inclusion of similar sentences in a summary. Experiments with representation models reveal that a simple local term count representation performs better than the distributed representation based on latent semantic analysis, which indicates that further investigation of distributed representations in regard to automatic summarization is necessary. The implemented system performs quite good compared with the systems that participated in DUC 2007 summarization competition. Nevertheless, manual inspection of the generated summaries demonstrate some of the flaws of the implemented summarization mechanism that can be addressed by introducing advanced algorithms for sentence simplification and sentence ordering.
Publisher
Institutt for datateknikk og informasjonsvitenskap

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit
 

 

Browse

ArchiveCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsDocument TypesJournalsThis CollectionBy Issue DateAuthorsTitlesSubjectsDocument TypesJournals

My Account

Login

Statistics

View Usage Statistics

Contact Us | Send Feedback

Privacy policy
DSpace software copyright © 2002-2019  DuraSpace

Service from  Unit