Deep Reinforcement Learning and Generative Adversarial Networks for Abstractive Text Summarization
Abstract
News articles, papers and encyclopedias, among other texts can be time-consuming to digest. Often, you are not interested in reading all the material, but only some of it. Summaries can be useful to get a grasp of what they are about. The task of generating a summary is time-consuming because you need to read the text, and you need to understand which parts are important. This makes it very attractive to try to automatically generate summaries using a computer program.
Abstractive text summarization has gained a lot of attraction in recent years and the standard supervised learning approach have seen promising results when used to train abstractive text summarization models. However, they are limited by the fact that they assume the ground truth to be provided at each time-step during training. This is not the case at test time, where the previous generated word is provided instead. This creates a gap between training and testing, also known as "exposure bias".
In this thesis, we explore how to improve an abstractive text summarization model by employing reinforcement learning and generative adversarial networks, which do not assume the ground truth to be provided during training. As a base model to improve upon, we implement a variation of the Pointer-Generator Network [See et al., 2017].
There are a lot of implementation details and parameter choices that are important for training stability and convergence, which are mostly left out of research papers. In this regard, we conduct an extensive study on how different training strategies, parameters and objective functions affect training stability and convergence, as well as the generated summaries. Another problem with training of abstractive text summarization models is that it is generally very time-consuming. To cope with some of these problems, we propose code optimization techniques that will help speed up the training time.
We show improvements from the base model in terms of ROUGE scores, as well as differences in generated summaries, using the objective functions of ROUGE-1, ROUGE-2, discriminator (adversarial training), and a combined model of ROUGE-2 and the discriminator.