Deep Reinforcement Learning Based Controllers In Underwater Robotics

Kjærnli, Eirik F.

dc.contributor.advisor	Schjølberg, Ingrid
dc.contributor.author	Kjærnli, Eirik F.
dc.date.accessioned	2019-09-11T08:51:47Z
dc.date.created	2018-06-27
dc.date.issued	2018
dc.identifier	ntnudaim:19144
dc.identifier.uri	http://hdl.handle.net/11250/2615067
dc.description.abstract	This thesis investigates the possibility of creating a controller for a Remotely Operated Vehicle (ROV) using deep neural networks, optimized by either of the model-free reinforcement learning algorithms Deep Deterministic Policy Gradient(DDPG) and Proximal Policy Optimization (PPO). Due to aging equipment on the Norwegian Continental Shelf, the need for inspection, maintenance, and repair operations is expected to increase considerably in the coming years. ROVs play a significant role in these operations and are today controlled by advanced model-based controllers. These controllers require accurate models of the ROV dynamics which are both time-consuming and expensive to develop. As a result, there is a need for controllers which are less dependent on models, to reduce cost and increase efficiency. Recent research in the field of reinforcement learning has shown that it is possible to create model-free controllers using deep neural networks, and the goal of this thesis is to investigate if this is applicable to ROV controllers. The thesis presents the fundamental principals of reinforcement learning, deep neural networks, in addition to the state-of-the-art reinforcement learning algorithms DDPG and PPO. Modification to improve the original algorithms is also discussed. Based on experimental data, a mathematical model of the BlueROV2 was created and implemented in a simulator using Python. The algorithms DDPG and PPO were then implemented in the same simulator using the machine learning framework \textit{Tensorflow}. To efficiently train the deep neural networks, a suitable reward function and training scenario called the \textit{Randomly Initialized Dynamic Positioning} were suggested. Finally, the performance of the trained controllers was verified by applying them to a dynamic positioning scenario and a waypoint tracking scenario. The results showed that it is possible to create a deep neural network based controller using both algorithms, under the assumption that the ROV can be assumed stable in roll and pitch. A controller was also created for an underactuated model with six degrees of freedom using the PPO algorithm, however, the controller was only able to complete the dynamic positioning scenario. A comparison between the two algorithms showed that the PPO outperformed the DDPG algorithm regarding consistent convergence to a satisfactory controller. A reoccurring problem in the both controllers was rapid oscillations in the thrust output. The action output of the PPO showed resemblance to a thrust output influenced by noise, and as the thrust is sampled from a learned Gaussian distribution, this was concluded as the most probable cause. Adding a control output filter was therefore suggested. The output of the DDPG did not show any clear patterns, and it was suggested that the solution found by this controller was due to a weakness in the simulator exploited by the controller. The PPO was therefore seen as the superior candidate for further research on this topic.	en
dc.language	eng
dc.publisher	NTNU
dc.subject	Marin teknikk, Marin kybernetikk	en
dc.title	Deep Reinforcement Learning Based Controllers In Underwater Robotics	en
dc.type	Master thesis	en
dc.source.pagenumber	137
dc.contributor.department	Norges teknisk-naturvitenskapelige universitet, Fakultet for ingeniørvitenskap,Institutt for marin teknikk	nb_NO
dc.date.embargoenddate	10000-01-01

Files in this item

Name:: 19144_FULLTEXT.pdf
Size:: 7.461Mb
Format:: PDF

Locked

Name:: 19144_COVER.pdf
Size:: 1.556Mb
Format:: PDF

Locked

This item appears in the following Collection(s)

Institutt for marin teknikk [3397]

Show simple item record