Sammendrag
As a fundamental and longstanding task for detecting visually salient edges from images, edge detection task is still considered as one of the core and challenging works on several computer vision and image processing tasks. Previous algorithms either employ user knowledge on top of the classical non-deep learning-based algorithms, or employ fully automatic deep learning-based edge detectors without utilising the valuable domain knowledge from the user. From there, these methods can not fully meet user expectations and there is quite a bit of room for improvement.
We approach this problem with the introduction of user interactions into deep learning models. User interactions in multiple modes will be encoded into different types of 2-channel maps by the corresponding encoding strategies, where the thick encoding strategies make it easier for users to interact. Then, by retrofitting the existing state-of-the-art edge detector architecture, we enable it to accept user interaction channels and previous predictions as additional inputs at both shallow and deeper layers of the network, thus preventing additional inputs from vanishing in the hidden layers of the network, meanwhile this retrofit process maintains the parameters of the model within the same order of magnitude. We also propose a robust algorithm for generating simulated user interactions in real time during training stage, which is crucial for achieving an efficient and effective interactive deep model.
The proposed architecture is trained on Barcelona Images for Perceptual Edge Detection Dataset in an end-to-end fashion from scratch without any pre-trained weights, and evaluated on another dataset with different scenarios. Quantitative and qualitative evaluation results on both seen and unseen scenarios demonstrate that our proposed deep learning-based interactive edge detector has a good generalization ability and significant improvement over state-of-the-art fully automatic methods.