Visual Estimation of Motion for ROVs - Increasing Accuracy for ROV Navigation
MetadataShow full item record
- Institutt for marin teknikk 
This project thesis presents how a remotely-operated vehicle (ROV) can increase its situational awareness and positioning accuracy based on visual inputs from a stereo camera and vision-based ego-motion estimation. This is attempted to increase the level of autonomy and to increase the ROV capabilities, repeatability and efficiency during ROV operations. The ROV in question is equipped with a stereo camera set and based on a pin-hole model of the camera, 3D camera frame position of the objects in the image can be computed and tracked across the image frame. This is used to estimate the motion of the camera, and thus the motion of the ROV. Objects in the image are detected as features, and classified using the Binary Robust Invariant Scalable Keypoints (BRISK) method. Each feature in the image is described using a vector and compared to descriptors in another frame to match features. The feature matching is implemented as a circle matching procedure, where features in the previous left frame is matched with features in the previous right, then current right, current left and back again to the previous left. This way, we can ensure that all features used in the motion estimation algorithms are matched both spatially for stereo vision and temporally for two consecutive frames. The 3D camera frame coordinates are computed based on the locations of the feature in the image frame and the intrinsic camera parameters. With a stereo set of cameras the depth of the features in the image can also be computed. The camera frame coordinates of the features of the previous frame are then reprojected to the current image frame, using the camera intrinsic parameters. There will be an error between the locations of the features in the current image frame and the reprojected camera frame coordinates of the same features. This error is minimised using a Gauss-Newton optimisation in order to estimate the motion of the camera from the previous to the current frame. The optimisation is initialised N times for three random points, giving N estimates of the camera motion. These estimates are then compared and adjusted to get the final best estimate of the camera motion, which also includes a Kalman Filter to smooth the signal and predict the motion when the features are badly detected or matched. The vision-based ego-motion estimation is tested on an image set from a previous mission and compared to the navigation data from the mission. The results show that the algorithm estimates the overall motion of the ROV, with some smaller oscillations around the measured value. The oscillations are most likely due to the optimisation algorithm, which sometimes overestimate the rotations over the translations. The visual motion estimation output is merged with the measurements from the other sensors on-board the ROV and included in a Kalman Filter for state estimation. The objective is to improve underwater localisation and manoeuvring close to the seabed or close to man-made installations. To validate the results, the system has been simulated on the image set mentioned above. The simulations show good results, the VME algorithm is able to output velocity estimates at an average update rate of 2 Hz, meaning the VME can run at approximately 2 frames per second. The results from the simulation show a slightly improved motion estimate, especially for the velocities. The performance of the Kalman Filter for position estimates was harder to evaluate, as the transponder measurements included many wild points and included an offset from the true position, and the Kalman Filter had not converged when running simulated scenarios. This project shows how computer vision techniques can improve underwater navigation by using a stereo camera rig to estimate the ROV motion. The results from the VME implementation proved that by using feature based method of estimating the camera frame motion, the output correspond with the actual ROV motion. The feature detection with spatial and temporal matching of consecutive stereo image pairs was implemented using the BRISK algorithms, reducing the computational effort compared to other algorithms. Compared to the SURF method, the BRISK showed comparative accuracy at a considerably reduced computational time. The next step of development could be to look at the possibility of a SLAM approach. Using the images taken to update a reconstructed map of the current surroundings. Then the ROV would recognise its position when returning to a previously visited location, while continuously updating the map. This would also considerably improve the position estimate as mentioned above. There is also some areas of improvement regarding the algorithm developed for this project, both in terms of accuracy and computational time.