Deep Reinforcement Learning Based Controllers In Underwater Robotics

Kjærnli, Eirik F.

Kjærnli, Eirik F.

Master thesis

Åpne

19144_FULLTEXT.pdf (Låst)

19144_COVER.pdf (Låst)

Permanent lenke

http://hdl.handle.net/11250/2615067

Utgivelsesdato

2018

Metadata

Vis full innførsel

Samlinger

Institutt for marin teknikk [3472]

Sammendrag

This thesis investigates the possibility of creating a controller for a Remotely Operated Vehicle (ROV) using deep neural networks, optimized by either of the model-free reinforcement learning algorithms Deep Deterministic Policy Gradient(DDPG) and Proximal Policy Optimization (PPO).

Due to aging equipment on the Norwegian Continental Shelf, the need for inspection, maintenance, and repair operations is expected to increase considerably in the coming years. ROVs play a significant role in these operations and are today controlled by advanced model-based controllers. These controllers require accurate models of the ROV dynamics which are both time-consuming and expensive to develop. As a result, there is a need for controllers which are less dependent on models, to reduce cost and increase efficiency. Recent research in the field of reinforcement learning has shown that it is possible to create model-free controllers using deep neural networks, and the goal of this thesis is to investigate if this is applicable to ROV controllers.

The thesis presents the fundamental principals of reinforcement learning, deep neural networks, in addition to the state-of-the-art reinforcement learning algorithms DDPG and PPO. Modification to improve the original algorithms is also discussed. Based on experimental data, a mathematical model of the BlueROV2 was created and implemented in a simulator using Python. The algorithms DDPG and PPO were then implemented in the same simulator using the machine learning framework \textit{Tensorflow}. To efficiently train the deep neural networks, a suitable reward function and training scenario called the \textit{Randomly Initialized Dynamic Positioning} were suggested. Finally, the performance of the trained controllers was verified by applying them to a dynamic positioning scenario and a waypoint tracking scenario.

The results showed that it is possible to create a deep neural network based controller using both algorithms, under the assumption that the ROV can be assumed stable in roll and pitch. A controller was also created for an underactuated model with six degrees of freedom using the PPO algorithm, however, the controller was only able to complete the dynamic positioning scenario.

A comparison between the two algorithms showed that the PPO outperformed the DDPG algorithm regarding consistent convergence to a satisfactory controller. A reoccurring problem in the both controllers was rapid oscillations in the thrust output. The action output of the PPO showed resemblance to a thrust output influenced by noise, and as the thrust is sampled from a learned Gaussian distribution, this was concluded as the most probable cause. Adding a control output filter was therefore suggested. The output of the DDPG did not show any clear patterns, and it was suggested that the solution found by this controller was due to a weakness in the simulator exploited by the controller. The PPO was therefore seen as the superior candidate for further research on this topic.

Utgiver

NTNU