Vis enkel innførsel

dc.contributor.advisorRuocco, Massimiliano
dc.contributor.advisorCastejon, Humberto
dc.contributor.advisorChandra, Arjun
dc.contributor.authorMarkussen, Olav Bjørnstad
dc.date.accessioned2019-09-11T10:55:50Z
dc.date.available2019-09-11T10:55:50Z
dc.date.created2018-09-27
dc.date.issued2018
dc.identifierntnudaim:18031
dc.identifier.urihttp://hdl.handle.net/11250/2615802
dc.description.abstractReinforcement learning is learning to behave optimally with respect to an external observer through interactions with an environment. An agent re- peatedly tries to accomplish a goal, each trial yielding some more infor- mation about the environment. Recent work by Bellemare et al. (2017) introduce a technique, C51, that extends the point estimate of future reward to a probability distribution. This opens the door for new action-selection schemes and exploration strategies. It is also a possible source for intrin- sic motivation, using uncertainty to generate directed exploration. Recent work by Moerland et al. (2018) presents promising results when using dis- tributions to explore in a deterministic MDP setting by way of Thompson sampling. Their results also prove that this way of representing returns are a valid option to guide exploration. This thesis introduce a novel way of computing intrinsic reward based on distributions from the C51 algorithm. The resulting intrinsic reward enables the agent to quickly explore a new environment, resulting in a performance on par with Moerland et al. (2018) in the randomized Chain environment.en
dc.languageeng
dc.publisherNTNU
dc.subjectDatateknologi, Kunstig intelligensen
dc.titleIntrinsic Motivation from Distributional Reinforcement Learningen
dc.typeMaster thesisen
dc.source.pagenumber63
dc.contributor.departmentNorges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for datateknologi og informatikknb_NO


Tilhørende fil(er)

Thumbnail
Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel