On Course Towards Model-Free Guidance: A Self-Learning Approach To Dynamic Collision Avoidance for Autonomous Surface Vehicles
Master thesis
Permanent lenke
https://hdl.handle.net/11250/2780874Utgivelsesdato
2020Metadata
Vis full innførselSamlinger
Sammendrag
I denne masteroppgåva vart det demonstrert at djup forsterkande læring (engelsk: DeepReinforcement Learning / DRL) kan nyttast for å trena eit reaktivt, autonomt fartøy utstyrtmed påmonterte avstandssensorar til å navigera ukjent farvatn, kva omfattar ikkje berre eiutfordring om å unngå stranding medan ein går framover i samsvar med den ønskte ruta,men også dynamisk kollisjonsunngåelse, altså styringsstrategiar som minimerer risikoenfor sammentreff i situasjonar der fartøyet er på kollisjonskurs med andre imøtekommandeeller kryssande skip.
For dette formålet vart læringsalgoritmen PPO (engelsk Proximal Policy Optimication /PPO) nytta, som er sett på som ein leiande DRL-metode for anvendelser innan reguleringsteknikkav kontinuerleg natur. Den lærande agenten, som gjennom treningsprosessen vartrettleidd av ein belønningsfunksjon konstuert for å, på numerisk vis, gjenspegla preferansanevåre for styringsåtferda til fartøyet, vart så evaluert basert på prestasjonen sin ieit virtuelt simuleringsmiljø som vart rekonstruert frå terreng- og maritime trafikkdata fråTrondheimsfjorden. In this project, we show that Deep Reinforcement Learning (DRL) is applicable to theproblem of training a reactive, autonomous vessel to navigate unknown waters, which entailsnot only the challenge of avoiding running ashore while efficiently making progressalong the desired path, but also dynamic obstacle avoidance, i.e. control that mitigates collisionrisk upon ship encounters. A rangefinder sensor suite attached to the vessel, whoseoutput is fed to the agent’s control policy network, is designed, implemented in softwareand efficiently pre-processed to reduce the dimensionality of the perception vector whilemaintaining sensing integrity.
The contribution of this work is two-fold: First, we outline the design, implementationand training of the perception-based guidance agent, with the goal of making it capableof following priori known trajectories while avoiding collisions with other vessels. Thereinforcement learning agent is trained to control the vessel’s actuators, which includeboth thrusters as well as rudder control surfaces. A carefully constructed reward function,which balances the prioritization of path adherence versus that of collision avoidance(which can be considered competing objectives), is used to guide the agent’s learning process.Then, the state-of-the-art Proximal Policy Optimization (PPO) DRL algorithm isutilized for training the agent’s policy such that it, in the end, yields optimal actions withregards to maximizing the reward that the agent receives by the environment over time.Finally, we evaluate the trained agent’s performance in challenging, dynamic test scenarios,including ones that are reconstructed from real-world terrain and maritime traffic datafrom the Trondheim Fjord, an inlet of the Norwegian sea.
Furthermore, The Python simulation framework gym-auv, which was developed to facilitatethis research, has a vast potential to enable further research in the field, and is thuscovered extensively in this thesis. It provides not only a software foundation that can beeasily expanded by new environments, reward function designs and vessel models, butalso access to high-quality plotting and reporting functionality as well as access to realtime(and recorded) video rendering in both 2D and 3D.