Deep Reinforcement Learning based tracking behavior for underwater vehicles
MetadataVis full innførsel
- Institutt for marin teknikk 
This thesis introduces the use of Machine Learning, specifically ReinforcementLearning, to create a model-free tracking property for Remotely Operated Vehicles(ROV). In detail, the ROV is trained by a RL algorithm to track an arucomarker, using online implementation of a Computer Vision (CV) algorithm as adetection property. The main motivation behind this enterprise is the contributionto increased autonomy in underwater operations, by introducing model-freeautonomous tracking behavior to underwater vehicles. This approach of implementationrequires minimal human intervention during operation, while significantlyreducing prior human control programming effort. Firstly, a simulator based trackingbehavior training of the ROV was done prior to conducting physical experimentswith a real ROV in the MC-laboratory at NTNU. The ROV used for the experimentaltests is a BlueROV2, which is highly customizable and fitting for R&Dpurposes. The theory presented in this thesis lays the groundwork for the many reasoningsdone in this project s course, including the choice of RL method. The RL algorithmchosen for training the tracking behavior is a online Python implementation ofthe type Proximal Policy Optimization (PPO) algorithm. The tracking behavior istrained on a simulator, which is a Python script based on typical OpenAI s simulatorarchitecture. The resulting tracking performance is then evaluated by studying theevolution of accumulated rewards and ROV s trajectory plots. While the resultingperformance did show to have some weak sides, it was, however, feasible enough totest the trained model in a real-world setting. However, the real-world experiments did not yield positive tracking results, consideringthe ROV performed in a random manner instead of favorably moving towardsthe aruco marker. Several challenges described in the theory-section proved to beprevalent during the lab experiments, which caused the disruption in the real-worldtracking performance. Nonetheless, based on experience gained from both the simulationsand real-world experiments, various proposals for further work was devisedand highlighted. Especially, is the importance of appropriate reward function designunderlined.