Deep Reinforcement Learning based tracking behavior for underwater vehicles

2018

This thesis introduces the use of Machine Learning, specifically Reinforcement

Learning, to create a model-free tracking property for Remotely Operated Vehicles

(ROV). In detail, the ROV is trained by a RL algorithm to track an aruco

marker, using online implementation of a Computer Vision (CV) algorithm as a

detection property. The main motivation behind this enterprise is the contribution

to increased autonomy in underwater operations, by introducing model-free

autonomous tracking behavior to underwater vehicles. This approach of implementation

requires minimal human intervention during operation, while significantly

reducing prior human control programming effort. Firstly, a simulator based tracking

behavior training of the ROV was done prior to conducting physical experiments

with a real ROV in the MC-laboratory at NTNU. The ROV used for the experimental

tests is a BlueROV2, which is highly customizable and fitting for R&D

purposes.

The theory presented in this thesis lays the groundwork for the many reasonings

done in this project s course, including the choice of RL method. The RL algorithm

chosen for training the tracking behavior is a online Python implementation of

the type Proximal Policy Optimization (PPO) algorithm. The tracking behavior is

trained on a simulator, which is a Python script based on typical OpenAI s simulator

architecture. The resulting tracking performance is then evaluated by studying the

evolution of accumulated rewards and ROV s trajectory plots. While the resulting

performance did show to have some weak sides, it was, however, feasible enough to

test the trained model in a real-world setting.

However, the real-world experiments did not yield positive tracking results, considering

the ROV performed in a random manner instead of favorably moving towards

the aruco marker. Several challenges described in the theory-section proved to be

prevalent during the lab experiments, which caused the disruption in the real-world

tracking performance. Nonetheless, based on experience gained from both the simulations

and real-world experiments, various proposals for further work was devised

and highlighted. Especially, is the importance of appropriate reward function design

underlined.

NTNU