Dynamic Positioning using Deep Reinforcement Learning

Øvereng, Simen Sem

dc.contributor.advisor	Nguyen, Dong Trong
dc.contributor.author	Øvereng, Simen Sem
dc.date.accessioned	2021-03-02T15:47:56Z
dc.date.available	2021-03-02T15:47:56Z
dc.date.issued	2020
dc.identifier.uri	https://hdl.handle.net/11250/2731248
dc.description.abstract	Classic methods for Dynamic Positioning (DP) of surface vessels often consists of first calculating desired forces and moments to exert on the vessel by using a motion controller, followed by a thrust allocation scheme for translating these forces and moments into individual thruster commands. Common thrust allocation methods usually performs this translation by calculating a pseudoinverse matrix, or by using various formulations of an objective function for an optimization problem. In optimization, several solutions might exist that satisfies the desired forces/moments due to the vessels being overactuated, and additional objectives and/or constraints are usually added for finding a final solution at each time step. This can be challenging in terms of computationally demanding solvers due to the potentially complex configuration of actuators, including azimuth thrusters, variable pitch propellers, tunnel thrusters, rudders etc. This thesis was set to investigate how advances in machine learning could be used to train a machine learning model in simulation for the development a control scheme which was both positionally accurate and energy efficient, while avoiding heavy computations on board during DP operations. The thesis was written in collaboration with DNVGL, which gave access to technical advisory, in addition to both a simulation platform and a physical ship model on which testing of algorithms could be performed. This thesis built on work from a project thesis from 2019 [1]. There it was found that using supervised learning for thrust allocation could give adequate results in terms of accurate translation of desired forces and moments. It was however found that it was difficult to formulate the constraints and boundaries on the azimuth thrusters' force production and angular rates using supervised learning. In this thesis, another type of machine learning called Deep Reinforcement Learning (DRL) was used. It replaced both the motion controller and the thrust allocation, translating a desired vessel pose directly into individual thruster commands. This was obtained by training the DRL model in a simulator while attempting to maximize return over time from a reward function. The neural networks which were used for representing the DRL model were trained using the digital twin of the ReVolt ship model, and the Proximal Policy Optimization (PPO) algorithm. The entire training regimen for the DRL method was created, and a (to the author's knowledge) novel reward function was used for positional accuracy, being a multivariate, Gaussian reward function. The DRL method was tested against classic methods, using a Proportional-Integral-Derivative (PID) motion controller with feedforward, and two different thrust allocation methods. The first thrust allocation method was a proprietary method given by DNVGL based on calculating a pseudoinverse, while the other was a method based on Quadratic Programming (QP), implemented by the author. All methods were tested against each other in various test scenarios in the simulator for evaluation of accuracy, robustness and energy efficiency. In addition, a sea trial was performed on the physical ship model of ReVolt to evaluate the difference between performance in simulator and in real life. The simulation results showed that the DRL method was robust to changes in the desired pose which was larger than what it had been trained on while performing DP, and performed better positional accuracy than the classic methods while doing so. Test scenarios with no environmental loads enabled showed that the DRL method was both better in terms of positional accuracy, energy usage, and wear and tear than the classic methods. After enabling a constant ocean current, the resulting positional accuracy was similar among the methods, while the energy efficiency was lowest when using the DRL method. A station-keeping test with large wind, wave and current loads showed that the DRL method was able to maintain acceptable positional accuracy while station-keeping in a larger sea state than what the method had been trained on. In the sea trial, the DRL method's positional accuracy was found good, having comparable performance to the similar test in simulation. Due to hardware issues related to the ship model's bow propeller, the vessel movement was more oscillatory, resulting in a substantial increase of the energy usage when compared to the simulation since the method had to compensate for the hardware issue. The thesis concluded with that DRL's potential for solving tasks with continuous control signals which requires accuracy and energy efficiency is realizable both in terms of good performance in the simulated tests, as well as by performing well when transitioning from simulations to real life. Recommended future work was divided into two branches. The first aimed at improving real life performance by retraining the DRL model trained in simulation by using transfer learning on board the real ship model in order to learn details of the hardware which was not modeled in the digital twin. The second was to consider newer methods within the area of providing safety and stability guarantees during training of control policies in order to develop a notion of the worst case performance of DRL systems.	en_US
dc.language.iso	eng	en_US
dc.publisher	NTNU	en_US
dc.title	Dynamic Positioning using Deep Reinforcement Learning	en_US
dc.type	Master thesis	en_US

Tilhørende fil(er)

Filnavn:: 6aeef25a-d14e-4d69-97ea-88f100 ...
Størrelse:: 18.39Mb
Format:: PDF
Beskrivelse:: Øvereng, Simen Sem

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for marin teknikk [3431]

Vis enkel innførsel