dc.contributor.advisor | Ruocco, Massimiliano | |
dc.contributor.advisor | Castejon, Humberto | |
dc.contributor.advisor | Chandra, Arjun | |
dc.contributor.author | Markussen, Olav Bjørnstad | |
dc.date.accessioned | 2019-09-11T10:55:50Z | |
dc.date.available | 2019-09-11T10:55:50Z | |
dc.date.created | 2018-09-27 | |
dc.date.issued | 2018 | |
dc.identifier | ntnudaim:18031 | |
dc.identifier.uri | http://hdl.handle.net/11250/2615802 | |
dc.description.abstract | Reinforcement learning is learning to behave optimally with respect to an
external observer through interactions with an environment. An agent re-
peatedly tries to accomplish a goal, each trial yielding some more infor-
mation about the environment. Recent work by Bellemare et al. (2017)
introduce a technique, C51, that extends the point estimate of future reward
to a probability distribution. This opens the door for new action-selection
schemes and exploration strategies. It is also a possible source for intrin-
sic motivation, using uncertainty to generate directed exploration. Recent
work by Moerland et al. (2018) presents promising results when using dis-
tributions to explore in a deterministic MDP setting by way of Thompson
sampling. Their results also prove that this way of representing returns are
a valid option to guide exploration. This thesis introduce a novel way of
computing intrinsic reward based on distributions from the C51 algorithm.
The resulting intrinsic reward enables the agent to quickly explore a new
environment, resulting in a performance on par with Moerland et al. (2018)
in the randomized Chain environment. | en |
dc.language | eng | |
dc.publisher | NTNU | |
dc.subject | Datateknologi, Kunstig intelligens | en |
dc.title | Intrinsic Motivation from Distributional Reinforcement Learning | en |
dc.type | Master thesis | en |
dc.source.pagenumber | 63 | |
dc.contributor.department | Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for datateknologi og informatikk | nb_NO |