On Course Towards Model-Free Guidance: A Self-Learning Approach To Dynamic Collision Avoidance for Autonomous Surface Vehicles

Meyer, Eivind

dc.contributor.advisor	Rasheed, Adil
dc.contributor.author	Meyer, Eivind
dc.date.accessioned	2021-09-23T17:59:10Z
dc.date.available	2021-09-23T17:59:10Z
dc.date.issued	2020
dc.identifier	no.ntnu:inspera:56990118:18201685
dc.identifier.uri	https://hdl.handle.net/11250/2780874
dc.description.abstract	I denne masteroppgåva vart det demonstrert at djup forsterkande læring (engelsk: Deep Reinforcement Learning / DRL) kan nyttast for å trena eit reaktivt, autonomt fartøy utstyrt med påmonterte avstandssensorar til å navigera ukjent farvatn, kva omfattar ikkje berre ei utfordring om å unngå stranding medan ein går framover i samsvar med den ønskte ruta, men også dynamisk kollisjonsunngåelse, altså styringsstrategiar som minimerer risikoen for sammentreff i situasjonar der fartøyet er på kollisjonskurs med andre imøtekommande eller kryssande skip. For dette formålet vart læringsalgoritmen PPO (engelsk Proximal Policy Optimication / PPO) nytta, som er sett på som ein leiande DRL-metode for anvendelser innan reguleringsteknikk av kontinuerleg natur. Den lærande agenten, som gjennom treningsprosessen vart rettleidd av ein belønningsfunksjon konstuert for å, på numerisk vis, gjenspegla preferansane våre for styringsåtferda til fartøyet, vart så evaluert basert på prestasjonen sin i eit virtuelt simuleringsmiljø som vart rekonstruert frå terreng- og maritime trafikkdata frå Trondheimsfjorden.
dc.description.abstract	In this project, we show that Deep Reinforcement Learning (DRL) is applicable to the problem of training a reactive, autonomous vessel to navigate unknown waters, which entails not only the challenge of avoiding running ashore while efficiently making progress along the desired path, but also dynamic obstacle avoidance, i.e. control that mitigates collision risk upon ship encounters. A rangefinder sensor suite attached to the vessel, whose output is fed to the agent’s control policy network, is designed, implemented in software and efficiently pre-processed to reduce the dimensionality of the perception vector while maintaining sensing integrity. The contribution of this work is two-fold: First, we outline the design, implementation and training of the perception-based guidance agent, with the goal of making it capable of following priori known trajectories while avoiding collisions with other vessels. The reinforcement learning agent is trained to control the vessel’s actuators, which include both thrusters as well as rudder control surfaces. A carefully constructed reward function, which balances the prioritization of path adherence versus that of collision avoidance (which can be considered competing objectives), is used to guide the agent’s learning process. Then, the state-of-the-art Proximal Policy Optimization (PPO) DRL algorithm is utilized for training the agent’s policy such that it, in the end, yields optimal actions with regards to maximizing the reward that the agent receives by the environment over time. Finally, we evaluate the trained agent’s performance in challenging, dynamic test scenarios, including ones that are reconstructed from real-world terrain and maritime traffic data from the Trondheim Fjord, an inlet of the Norwegian sea. Furthermore, The Python simulation framework gym-auv, which was developed to facilitate this research, has a vast potential to enable further research in the field, and is thus covered extensively in this thesis. It provides not only a software foundation that can be easily expanded by new environments, reward function designs and vessel models, but also access to high-quality plotting and reporting functionality as well as access to realtime (and recorded) video rendering in both 2D and 3D.
dc.language
dc.publisher	NTNU
dc.title	On Course Towards Model-Free Guidance: A Self-Learning Approach To Dynamic Collision Avoidance for Autonomous Surface Vehicles
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.ntnu:inspera:56990118:18201 ...
Størrelse:: 12.60Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Institutt for teknisk kybernetikk [3740]

Vis enkel innførsel