Intrinsic Motivation from Distributional Reinforcement Learning

Markussen, Olav Bjørnstad

dc.contributor.advisor	Ruocco, Massimiliano
dc.contributor.advisor	Castejon, Humberto
dc.contributor.advisor	Chandra, Arjun
dc.contributor.author	Markussen, Olav Bjørnstad
dc.date.accessioned	2019-09-11T10:55:50Z
dc.date.available	2019-09-11T10:55:50Z
dc.date.created	2018-09-27
dc.date.issued	2018
dc.identifier	ntnudaim:18031
dc.identifier.uri	http://hdl.handle.net/11250/2615802
dc.description.abstract	Reinforcement learning is learning to behave optimally with respect to an external observer through interactions with an environment. An agent re- peatedly tries to accomplish a goal, each trial yielding some more infor- mation about the environment. Recent work by Bellemare et al. (2017) introduce a technique, C51, that extends the point estimate of future reward to a probability distribution. This opens the door for new action-selection schemes and exploration strategies. It is also a possible source for intrin- sic motivation, using uncertainty to generate directed exploration. Recent work by Moerland et al. (2018) presents promising results when using dis- tributions to explore in a deterministic MDP setting by way of Thompson sampling. Their results also prove that this way of representing returns are a valid option to guide exploration. This thesis introduce a novel way of computing intrinsic reward based on distributions from the C51 algorithm. The resulting intrinsic reward enables the agent to quickly explore a new environment, resulting in a performance on par with Moerland et al. (2018) in the randomized Chain environment.	en
dc.language	eng
dc.publisher	NTNU
dc.subject	Datateknologi, Kunstig intelligens	en
dc.title	Intrinsic Motivation from Distributional Reinforcement Learning	en
dc.type	Master thesis	en
dc.source.pagenumber	63
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for informasjonsteknologi og elektroteknikk,Institutt for datateknologi og informatikk	nb_NO

Tilhørende fil(er)

Filnavn:: 18031_FULLTEXT.pdf
Størrelse:: 872.5Kb
Format:: PDF

Åpne

Filnavn:: 18031_COVER.pdf
Størrelse:: 1.556Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for datateknologi og informatikk [6830]

Vis enkel innførsel