Vis enkel innførsel

dc.contributor.advisorRasheed, Adil
dc.contributor.authorMeyer, Eivind
dc.date.accessioned2021-09-23T17:59:10Z
dc.date.available2021-09-23T17:59:10Z
dc.date.issued2020
dc.identifierno.ntnu:inspera:56990118:18201685
dc.identifier.urihttps://hdl.handle.net/11250/2780874
dc.description.abstractI denne masteroppgåva vart det demonstrert at djup forsterkande læring (engelsk: Deep Reinforcement Learning / DRL) kan nyttast for å trena eit reaktivt, autonomt fartøy utstyrt med påmonterte avstandssensorar til å navigera ukjent farvatn, kva omfattar ikkje berre ei utfordring om å unngå stranding medan ein går framover i samsvar med den ønskte ruta, men også dynamisk kollisjonsunngåelse, altså styringsstrategiar som minimerer risikoen for sammentreff i situasjonar der fartøyet er på kollisjonskurs med andre imøtekommande eller kryssande skip. For dette formålet vart læringsalgoritmen PPO (engelsk Proximal Policy Optimication / PPO) nytta, som er sett på som ein leiande DRL-metode for anvendelser innan reguleringsteknikk av kontinuerleg natur. Den lærande agenten, som gjennom treningsprosessen vart rettleidd av ein belønningsfunksjon konstuert for å, på numerisk vis, gjenspegla preferansane våre for styringsåtferda til fartøyet, vart så evaluert basert på prestasjonen sin i eit virtuelt simuleringsmiljø som vart rekonstruert frå terreng- og maritime trafikkdata frå Trondheimsfjorden.
dc.description.abstractIn this project, we show that Deep Reinforcement Learning (DRL) is applicable to the problem of training a reactive, autonomous vessel to navigate unknown waters, which entails not only the challenge of avoiding running ashore while efficiently making progress along the desired path, but also dynamic obstacle avoidance, i.e. control that mitigates collision risk upon ship encounters. A rangefinder sensor suite attached to the vessel, whose output is fed to the agent’s control policy network, is designed, implemented in software and efficiently pre-processed to reduce the dimensionality of the perception vector while maintaining sensing integrity. The contribution of this work is two-fold: First, we outline the design, implementation and training of the perception-based guidance agent, with the goal of making it capable of following priori known trajectories while avoiding collisions with other vessels. The reinforcement learning agent is trained to control the vessel’s actuators, which include both thrusters as well as rudder control surfaces. A carefully constructed reward function, which balances the prioritization of path adherence versus that of collision avoidance (which can be considered competing objectives), is used to guide the agent’s learning process. Then, the state-of-the-art Proximal Policy Optimization (PPO) DRL algorithm is utilized for training the agent’s policy such that it, in the end, yields optimal actions with regards to maximizing the reward that the agent receives by the environment over time. Finally, we evaluate the trained agent’s performance in challenging, dynamic test scenarios, including ones that are reconstructed from real-world terrain and maritime traffic data from the Trondheim Fjord, an inlet of the Norwegian sea. Furthermore, The Python simulation framework gym-auv, which was developed to facilitate this research, has a vast potential to enable further research in the field, and is thus covered extensively in this thesis. It provides not only a software foundation that can be easily expanded by new environments, reward function designs and vessel models, but also access to high-quality plotting and reporting functionality as well as access to realtime (and recorded) video rendering in both 2D and 3D.
dc.language
dc.publisherNTNU
dc.titleOn Course Towards Model-Free Guidance: A Self-Learning Approach To Dynamic Collision Avoidance for Autonomous Surface Vehicles
dc.typeMaster thesis


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel