Correspondence matching plays a crucial role in numerous
robotics applications. In comparison to conventional
hand-crafted methods and recent data-driven approaches,
there is significant interest in plug-and-play algorithms
that make full use of pre-trained backbone networks for
multi-scale feature extraction and leverage hierarchical
refinement strategies to generate matched correspondences.
The primary focus of this paper is to address the
limitations of deep feature matching (DFM), a
state-of-the-art (SoTA) plug-and-play correspondence
matching approach. First, we eliminate the pre-defined
threshold employed in the hierarchical refinement process
of DFM by leveraging a more flexible nearest neighbor
search strategy, thereby preventing the exclusion of
repetitive yet valid matches during the early stages. Our
second technical contribution is the integration of a patch
descriptor, which extends the applicability of DFM to
accommodate a wide range of backbone networks pre-trained
across diverse computer vision tasks, including image
classification, semantic segmentation, and stereo matching.
Taking into account the practical applicability of our
method in real-world robotics applications, we also propose
a novel patch descriptor distillation strategy to further
reduce the computational complexity of correspondence
matching. Extensive experiments conducted on three public
datasets demonstrate the superior performance of our
proposed method. Specifically, it achieves an overall
performance in terms of mean matching accuracy of 0.68,
0.92, and 0.95 with respect to the tolerances of 1, 3, and
5 pixels, respectively, on the HPatches dataset,
outperforming all other SoTA algorithms.
|