Visual Pretraining for Deep Q-Learning

Sandven, Torstein

dc.contributor.advisor	Downing, Keith
dc.contributor.author	Sandven, Torstein
dc.date.accessioned	2016-09-19T14:01:06Z
dc.date.available	2016-09-19T14:01:06Z
dc.date.created	2016-06-15
dc.date.issued	2016
dc.identifier	ntnudaim:15226
dc.identifier.uri	http://hdl.handle.net/11250/2408431
dc.description.abstract	Recent advances in reinforcement learning enable computers to learn human level polices for Atari 2600 games. This is done by training a convolutional neural network to play based on screenshots and in-game rewards. The network is referred to as a deep Q-network (DQN). The main disadvantage to this approach is a long training time. A computer will typically learn for approximately one week. In this time it processes 38 days of game play. This thesis explores the possibility of using visual pretraining to reduce the training time of DQN agents. Visual pretraining is done by training an autoencoder (AE) to reduce the dimensionality of images. When learning dimensionality reduction, the AE learns visual features by recognizing the structure of the images. To test if the AE can learn general visual features, AEs are trained on different datasets. After the pretraining, transfer learning is used to initialize DQNs with weights from the AE. In order to run the experiments a training system was built using Theano. The results generally show lower performance for cases with pretraining. This happens for all tested datasets. In fact, there is surprisingly little difference in the performance of AEs trained on different datasets. The lower performance most likely occurs because the trained AE focuses on large objects. Small moving objects are often not reconstructed correctly by the AE. These objects are often crucial to the reinforcement learning task. As a result, the image representation learnt by the AE is insufficient for the DQN agent. In addition, the weight magnitude is increased when AEs are trained. Since the parameters for the learning algorithm are tuned for smaller weights, it takes longer to correct the weights. In conclusion, the pretraining was harming the performance. Several possible solutions to this problem are discussed, e.g. increasing the network size, force the AE to focus on moving objects by weighting the loss function, and normalizing the AE.
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Intelligente systemer
dc.title	Visual Pretraining for Deep Q-Learning
dc.type	Master thesis
dc.source.pagenumber	92

Files in this item

Name:: 15226_FULLTEXT.pdf
Size:: 1.889Mb
Format:: PDF

View/Open

Name:: 15226_ATTACHMENT.zip
Size:: 26.52Kb
Format:: Unknown

View/Open

Name:: 15226_COVER.pdf
Size:: 1.556Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Institutt for datateknologi og informatikk [6769]

Show simple item record