Abstract
The emerge of artificial intelligence (AI) in past years had gained great attentions in academic research and successful applications in industries. Deep reinforcement learning (DRL), as an ultimate approach of AI, in which an AI agent senses the environment, makes decisions from its trained policies, and perform actions to the environment, and iterates its policies through the process, has been proven to be one of the most potential solutions to many problems comparing to traditional ones that requires precise modelling, complex parameter identifications methods, not generalizable to complex situations, etc.
Surface unmanned vessels (SUV) is one of the areas that is considered to can be most beneficial to the application of such technologies, since the complex water surface traffic can bring about great loss to ignorant operations by humans, and where the limit of traditional PID or MPC methods are witnessed. In this thesis, DRL-based collision avoidance methods are studied for autonomous navigation of surface vessels, and an alternative new algorithm and method was developed and brought to validation. Present DRL-based methods are still limited to academic research, not ready for industrial applications, this is what this paper focused on. The reason for this is that firstly due to their lack of explainability, the end users cannot easily visualize the decision process, confirm their liability and make use of them. Secondly, the method itself can’t be hard coded with rules that can strictly avoid collision, which means that it might provide decision that will cause collision without warnings at the beginning.
In this thesis, to approach the interpretability/explainability aspect of artificial intelligence decisions for end users, a method that outputs selected number of waypoints for the own ship to make collision avoidance was proposed. Specifically, two-waypoints of collision avoidance navigation was focused. Theoretical foundations include Policy-Based Reinforcement Learning, Policy Proximal Optimization, and explainable Artificial Intelligence in DRL. To guide the exploration of DRL agent, a ‘rule-based’ DRL concept was also presented, where some rules to limit the action space, fixing waypoints that is not being tracked, static planning are introduced. This will save the agent from exploring unwanted space and focus on required tasks. The proposed method also investigated two kinds of input: grid maps providing information of real time position and movements of ships with ENC, vectors including related information of scenario. The latter method is more straight forward but it might fall short in providing some features that could be extracted by the DRL agent as kind of aggregated/implicit features. The paper details implementation steps, from simulation environment setup to visualization. Preliminary results indicate successful algorithm validation, with the trained agent optimizing rewards through policy updates. With the designed framework, the agent learned to turn away from obstacle ships within a short time by maximizing the rewards and following the introduced rules, and the navigation turns out to be smooth and interpretable to human.