Show simple item record

dc.contributor.advisorRuocco, Massimiliano
dc.contributor.advisorChandra, Arjun
dc.contributor.advisorCastejon, Humberto
dc.contributor.authorNylend, Mikkel Sannes
dc.date.accessioned2017-09-12T14:00:24Z
dc.date.available2017-09-12T14:00:24Z
dc.date.created2017-06-08
dc.date.issued2017
dc.identifierntnudaim:16201
dc.identifier.urihttp://hdl.handle.net/11250/2454351
dc.description.abstractIn the last few years we have experienced great advances in the field of reinforcement learning (RL), much thanks to deep learning. By introducing deep neural networks in RL it is possible to have agents learn complex behaviors by just observing a game screen, just like humans learn to play games. Even though this is great, there is one limitation that makes the transition to real world problems tough, and that is data efficiency. One way to go about improving the data efficiency of RL is to approximate a model of the environment, called model-based RL. Even though model-based agents can be more data efficient, they are usually computationally heavy and often end up being too inaccurate. In this thesis, we explore the use of deep dynamics models (DDM) trained dynamically in environments with high-dimensional state representations. Furthermore, we study four different ways of calculating curiosity-based intrinsic motivation extracted from the DDM to achieve more efficient exploration. Having the DDM made up of an autoencoder (AE) and a transition prediction model that operate in the latent space generated by the AE, we introduce the first intrinsic bonus as the AE reconstruction error. The second one is based on the prediction error from the DDM. The third bonus introduce a novel idea of using MC dropout, presented in (Gal & Ghahramani 2015), to extract the uncertainty of the DDM. The last type of intrinsic bonus extract the uncertainty by MC dropout from a bootstrapped DDM. Interestingly, the proposed bonus based on MC dropout outperforms the more commonly used bonus based on dynamics prediction errors in both data efficiency and final performance in the Atari 2600 domain. Additionally, we manage to have agents learn by only receiving intrinsic reward and no any extrinsic rewards from the environment.
dc.languageeng
dc.publisherNTNU
dc.subjectDatateknologi (2 årig), Kunstig intelligens
dc.titleData Efficient Deep Reinforcement Learning through Model-Based Intrinsic Motivation
dc.typeMaster thesis


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record