dc.description.abstract | This thesis introduces the use of Machine Learning, specifically Reinforcement
Learning, to create a model-free tracking property for Remotely Operated Vehicles
(ROV). In detail, the ROV is trained by a RL algorithm to track an aruco
marker, using online implementation of a Computer Vision (CV) algorithm as a
detection property. The main motivation behind this enterprise is the contribution
to increased autonomy in underwater operations, by introducing model-free
autonomous tracking behavior to underwater vehicles. This approach of implementation
requires minimal human intervention during operation, while significantly
reducing prior human control programming effort. Firstly, a simulator based tracking
behavior training of the ROV was done prior to conducting physical experiments
with a real ROV in the MC-laboratory at NTNU. The ROV used for the experimental
tests is a BlueROV2, which is highly customizable and fitting for R&D
purposes.
The theory presented in this thesis lays the groundwork for the many reasonings
done in this project s course, including the choice of RL method. The RL algorithm
chosen for training the tracking behavior is a online Python implementation of
the type Proximal Policy Optimization (PPO) algorithm. The tracking behavior is
trained on a simulator, which is a Python script based on typical OpenAI s simulator
architecture. The resulting tracking performance is then evaluated by studying the
evolution of accumulated rewards and ROV s trajectory plots. While the resulting
performance did show to have some weak sides, it was, however, feasible enough to
test the trained model in a real-world setting.
However, the real-world experiments did not yield positive tracking results, considering
the ROV performed in a random manner instead of favorably moving towards
the aruco marker. Several challenges described in the theory-section proved to be
prevalent during the lab experiments, which caused the disruption in the real-world
tracking performance. Nonetheless, based on experience gained from both the simulations
and real-world experiments, various proposals for further work was devised
and highlighted. Especially, is the importance of appropriate reward function design
underlined. | |