Show simple item record

dc.contributor.advisorPettersen, Kristin Ytterstad
dc.contributor.advisorLøvlid, Rikke
dc.contributor.advisorEngebråten, Sondre
dc.contributor.authorFeiring, Patrick
dc.date.accessioned2019-09-11T11:42:37Z
dc.date.created2017-06-02
dc.date.issued2017
dc.identifierntnudaim:16436
dc.identifier.urihttp://hdl.handle.net/11250/2616111
dc.description.abstractReinforcement learning is a general framework for optimizing the behavioural policy of an agent in an environment that issues a scalar reward indicating how well the agent is performing. Reinforcement learning algorithms can be coarsely divided into two groups based on whether they incorporate models of the state transition dynamics or not. While models enable a designer to embed prior domain knowledge and thus reduce the sample complexity of the resulting algorithm, the model-free regime provides a conceptually elegant formulation for solving tasks in problem domains where such models are not as easily expressed. The goal of this thesis was to investigate the merits of model-free deep reinforcement learning in continuous action spaces. Behavioural policies were represented by artificial neural networks, a popular class of flexible function approximators. A literature study was performed and references to state-of-the-art algorithms were provided. The advantages and disadvantages of the approach were discussed with a basis in experiments conducted in the Mujoco simulator with the Trust Region Policy Optimization algorithm. The results showed that efficient utilization of computational resources were important. A novel method for computing Gauss-Newton vector products with reverse mode automatic differentiation engines was derived. In addition, an efficient action sampling scheme using batches was proposed. The scheme resulted in a 3-fold reduction in total training time. Particularly variance reduction techniques and sufficiently large time horizons were found to be important for the performance of the policy.en
dc.languageeng
dc.publisherNTNU
dc.subjectKybernetikk og robotikken
dc.titleDeep Reinforcement Learning for Model-Free Continuous Control with an Emphasis on Trust Region Policy Optimizationen
dc.typeMaster thesisen
dc.source.pagenumber86
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for teknisk kybernetikknb_NO
dc.date.embargoenddate10000-01-01


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record