Visual Pretraining for Deep Q-Learning

Recent advances in reinforcement learning enable computers to learn human level polices for Atari 2600 games. This is done by training a convolutional neural network to play based on screenshots and in-game rewards. The network is referred to as a deep Q-network (DQN). The main disadvantage to this approach is a long training time. A computer will typically learn for approximately one week. In this time it processes 38 days of game play. This thesis explores the possibility of using visual pretraining to reduce the training time of DQN agents.

Visual pretraining is done by training an autoencoder (AE) to reduce the dimensionality of images. When learning dimensionality reduction, the AE learns visual features by recognizing the structure of the images. To test if the AE can learn general visual features, AEs are trained on different datasets. After the pretraining, transfer learning is used to initialize DQNs with weights from the AE. In order to run the experiments a training system was built using Theano.

The results generally show lower performance for cases with pretraining. This happens for all tested datasets. In fact, there is surprisingly little difference in the performance of AEs trained on different datasets. The lower performance most likely occurs because the trained AE focuses on large objects. Small moving objects are often not reconstructed correctly by the AE. These objects are often crucial to the reinforcement learning task. As a result, the image representation learnt by the AE is insufficient for the DQN agent. In addition, the weight magnitude is increased when AEs are trained. Since the parameters for the learning algorithm are tuned for smaller weights, it takes longer to correct the weights. In conclusion, the pretraining was harming the performance. Several possible solutions to this problem are discussed, e.g. increasing the network size, force the AE to focus on moving objects by weighting the loss function, and normalizing the AE.

Utgiver

NTNU