On Course Towards Model-Free Guidance: A Self-Learning Approach To Dynamic Collision Avoidance for Autonomous Surface Vehicles

Meyer, Eivind

Meyer, Eivind

Master thesis

Åpne

no.ntnu:inspera:56990118:18201685.pdf (12.60Mb)

Permanent lenke

https://hdl.handle.net/11250/2780874

Utgivelsesdato

2020

Metadata

Vis full innførsel

Samlinger

Institutt for teknisk kybernetikk [3740]

Sammendrag

I denne masteroppgåva vart det demonstrert at djup forsterkande læring (engelsk: Deep

Reinforcement Learning / DRL) kan nyttast for å trena eit reaktivt, autonomt fartøy utstyrt

med påmonterte avstandssensorar til å navigera ukjent farvatn, kva omfattar ikkje berre ei

utfordring om å unngå stranding medan ein går framover i samsvar med den ønskte ruta,

men også dynamisk kollisjonsunngåelse, altså styringsstrategiar som minimerer risikoen

for sammentreff i situasjonar der fartøyet er på kollisjonskurs med andre imøtekommande

eller kryssande skip.

For dette formålet vart læringsalgoritmen PPO (engelsk Proximal Policy Optimication /

PPO) nytta, som er sett på som ein leiande DRL-metode for anvendelser innan reguleringsteknikk

av kontinuerleg natur. Den lærande agenten, som gjennom treningsprosessen vart

rettleidd av ein belønningsfunksjon konstuert for å, på numerisk vis, gjenspegla preferansane

våre for styringsåtferda til fartøyet, vart så evaluert basert på prestasjonen sin i

eit virtuelt simuleringsmiljø som vart rekonstruert frå terreng- og maritime trafikkdata frå

Trondheimsfjorden.

In this project, we show that Deep Reinforcement Learning (DRL) is applicable to the

problem of training a reactive, autonomous vessel to navigate unknown waters, which entails

not only the challenge of avoiding running ashore while efficiently making progress

along the desired path, but also dynamic obstacle avoidance, i.e. control that mitigates collision

risk upon ship encounters. A rangefinder sensor suite attached to the vessel, whose

output is fed to the agent’s control policy network, is designed, implemented in software

and efficiently pre-processed to reduce the dimensionality of the perception vector while

maintaining sensing integrity.

The contribution of this work is two-fold: First, we outline the design, implementation

and training of the perception-based guidance agent, with the goal of making it capable

of following priori known trajectories while avoiding collisions with other vessels. The

reinforcement learning agent is trained to control the vessel’s actuators, which include

both thrusters as well as rudder control surfaces. A carefully constructed reward function,

which balances the prioritization of path adherence versus that of collision avoidance

(which can be considered competing objectives), is used to guide the agent’s learning process.

Then, the state-of-the-art Proximal Policy Optimization (PPO) DRL algorithm is

utilized for training the agent’s policy such that it, in the end, yields optimal actions with

regards to maximizing the reward that the agent receives by the environment over time.

Finally, we evaluate the trained agent’s performance in challenging, dynamic test scenarios,

including ones that are reconstructed from real-world terrain and maritime traffic data

from the Trondheim Fjord, an inlet of the Norwegian sea.

Furthermore, The Python simulation framework gym-auv, which was developed to facilitate

this research, has a vast potential to enable further research in the field, and is thus

covered extensively in this thesis. It provides not only a software foundation that can be

easily expanded by new environments, reward function designs and vessel models, but

also access to high-quality plotting and reporting functionality as well as access to realtime

(and recorded) video rendering in both 2D and 3D.

Utgiver

NTNU