• norsk
    • English
  • norsk 
    • norsk
    • English
  • Logg inn
Vis innførsel 
  •   Hjem
  • Fakultet for ingeniørvitenskap (IV)
  • Institutt for marin teknikk
  • Vis innførsel
  •   Hjem
  • Fakultet for ingeniørvitenskap (IV)
  • Institutt for marin teknikk
  • Vis innførsel
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deep Reinforcement Learning Based Controllers In Underwater Robotics

Kjærnli, Eirik F.
Master thesis
Åpne
19144_FULLTEXT.pdf (Låst)
19144_COVER.pdf (Låst)
Permanent lenke
http://hdl.handle.net/11250/2615067
Utgivelsesdato
2018
Metadata
Vis full innførsel
Samlinger
  • Institutt for marin teknikk [2403]
Sammendrag
This thesis investigates the possibility of creating a controller for a Remotely Operated Vehicle (ROV) using deep neural networks, optimized by either of the model-free reinforcement learning algorithms Deep Deterministic Policy Gradient(DDPG) and Proximal Policy Optimization (PPO).

Due to aging equipment on the Norwegian Continental Shelf, the need for inspection, maintenance, and repair operations is expected to increase considerably in the coming years. ROVs play a significant role in these operations and are today controlled by advanced model-based controllers. These controllers require accurate models of the ROV dynamics which are both time-consuming and expensive to develop. As a result, there is a need for controllers which are less dependent on models, to reduce cost and increase efficiency. Recent research in the field of reinforcement learning has shown that it is possible to create model-free controllers using deep neural networks, and the goal of this thesis is to investigate if this is applicable to ROV controllers.

The thesis presents the fundamental principals of reinforcement learning, deep neural networks, in addition to the state-of-the-art reinforcement learning algorithms DDPG and PPO. Modification to improve the original algorithms is also discussed. Based on experimental data, a mathematical model of the BlueROV2 was created and implemented in a simulator using Python. The algorithms DDPG and PPO were then implemented in the same simulator using the machine learning framework \textit{Tensorflow}. To efficiently train the deep neural networks, a suitable reward function and training scenario called the \textit{Randomly Initialized Dynamic Positioning} were suggested. Finally, the performance of the trained controllers was verified by applying them to a dynamic positioning scenario and a waypoint tracking scenario.

The results showed that it is possible to create a deep neural network based controller using both algorithms, under the assumption that the ROV can be assumed stable in roll and pitch. A controller was also created for an underactuated model with six degrees of freedom using the PPO algorithm, however, the controller was only able to complete the dynamic positioning scenario.

A comparison between the two algorithms showed that the PPO outperformed the DDPG algorithm regarding consistent convergence to a satisfactory controller. A reoccurring problem in the both controllers was rapid oscillations in the thrust output. The action output of the PPO showed resemblance to a thrust output influenced by noise, and as the thrust is sampled from a learned Gaussian distribution, this was concluded as the most probable cause. Adding a control output filter was therefore suggested. The output of the DDPG did not show any clear patterns, and it was suggested that the solution found by this controller was due to a weakness in the simulator exploited by the controller. The PPO was therefore seen as the superior candidate for further research on this topic.
Utgiver
NTNU

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit
 

 

Bla i

Hele arkivetDelarkiv og samlingerUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifterDenne samlingenUtgivelsesdatoForfattereTitlerEmneordDokumenttyperTidsskrifter

Min side

Logg inn

Statistikk

Besøksstatistikk

Kontakt oss | Gi tilbakemelding

Personvernerklæring
DSpace software copyright © 2002-2019  DuraSpace

Levert av  Unit