dc.description.abstract | I denne masteroppgåva vart det demonstrert at djup forsterkande læring (engelsk: Deep
Reinforcement Learning / DRL) kan nyttast for å trena eit reaktivt, autonomt fartøy utstyrt
med påmonterte avstandssensorar til å navigera ukjent farvatn, kva omfattar ikkje berre ei
utfordring om å unngå stranding medan ein går framover i samsvar med den ønskte ruta,
men også dynamisk kollisjonsunngåelse, altså styringsstrategiar som minimerer risikoen
for sammentreff i situasjonar der fartøyet er på kollisjonskurs med andre imøtekommande
eller kryssande skip.
For dette formålet vart læringsalgoritmen PPO (engelsk Proximal Policy Optimication /
PPO) nytta, som er sett på som ein leiande DRL-metode for anvendelser innan reguleringsteknikk
av kontinuerleg natur. Den lærande agenten, som gjennom treningsprosessen vart
rettleidd av ein belønningsfunksjon konstuert for å, på numerisk vis, gjenspegla preferansane
våre for styringsåtferda til fartøyet, vart så evaluert basert på prestasjonen sin i
eit virtuelt simuleringsmiljø som vart rekonstruert frå terreng- og maritime trafikkdata frå
Trondheimsfjorden. | |
dc.description.abstract | In this project, we show that Deep Reinforcement Learning (DRL) is applicable to the
problem of training a reactive, autonomous vessel to navigate unknown waters, which entails
not only the challenge of avoiding running ashore while efficiently making progress
along the desired path, but also dynamic obstacle avoidance, i.e. control that mitigates collision
risk upon ship encounters. A rangefinder sensor suite attached to the vessel, whose
output is fed to the agent’s control policy network, is designed, implemented in software
and efficiently pre-processed to reduce the dimensionality of the perception vector while
maintaining sensing integrity.
The contribution of this work is two-fold: First, we outline the design, implementation
and training of the perception-based guidance agent, with the goal of making it capable
of following priori known trajectories while avoiding collisions with other vessels. The
reinforcement learning agent is trained to control the vessel’s actuators, which include
both thrusters as well as rudder control surfaces. A carefully constructed reward function,
which balances the prioritization of path adherence versus that of collision avoidance
(which can be considered competing objectives), is used to guide the agent’s learning process.
Then, the state-of-the-art Proximal Policy Optimization (PPO) DRL algorithm is
utilized for training the agent’s policy such that it, in the end, yields optimal actions with
regards to maximizing the reward that the agent receives by the environment over time.
Finally, we evaluate the trained agent’s performance in challenging, dynamic test scenarios,
including ones that are reconstructed from real-world terrain and maritime traffic data
from the Trondheim Fjord, an inlet of the Norwegian sea.
Furthermore, The Python simulation framework gym-auv, which was developed to facilitate
this research, has a vast potential to enable further research in the field, and is thus
covered extensively in this thesis. It provides not only a software foundation that can be
easily expanded by new environments, reward function designs and vessel models, but
also access to high-quality plotting and reporting functionality as well as access to realtime
(and recorded) video rendering in both 2D and 3D. | |