DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation

Tongji University
IEEE Transactions on Image Processing (T-IP), 2025

Abstract

There has been a recent surge of interest in learning to perceive depth from monocular videos in an unsupervised fashion. A key challenge in this field is achieving robust and accurate depth estimation in regions with weak textures or where dynamic objects are present. This study makes three major contributions by delving deeply into dense correspondence priors to provide existing frameworks with explicit geometric constraints. The first novel contribution is a contextual-geometric depth consistency loss, which employs depth maps triangulated from dense correspondences based on estimated ego-motion to guide the learning of depth perception from contextual information, since explicitly triangulated depth maps capture accurate relative distances among pixels. The second novel contribution arises from the observation that there exists an explicit, deducible relationship between optical flow divergence and depth gradient. A differential property correlation loss is therefore designed to refine depth estimation with a specific emphasis on local variations. The third novel contribution is a bidirectional stream co-adjustment strategy that enhances the interaction between rigid and optical flows, encouraging the former towards more accurate correspondence and making the latter more adaptable across various scenarios under the static scene hypotheses. DCPI-Depth, a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams, achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts. Specifically, it demonstrates accurate depth estimation in texture-less and dynamic regions, and shows more reasonable smoothness. Our source code is publicly available at mias.group/DCPI-Depth.

Methodology

The overall architecture of our proposed DCPI-Depth framework, which consists of two collaborative and bidirectional streams: PCG and CPG. The input image pairs, PoseNet, and the estimated ego-motion are depicted separately in each stream.

An Illustration of optical flow divergence: (a) optical flow divergence for pixels with similar intensities yet being spatially discontinuous; (b) optical flow divergence for pixels with significantly different intensities yet being spatially continuous.

An illustration of the interaction between PCG and CPG streams through the proposed BSCA strategy to address the challenges posed by dynamic objects.

BibTeX

@article{zhang2024dcpi, title={{DCPI-Depth}: Explicitly infusing dense correspondence prior to unsupervised monocular depth estimation}, author={Zhang, Mengtan and Feng, Yi and Chen, Qijun and Fan, Rui}, journal={IEEE Transactions on Image Processing}, year={2025}, volume={34}, pages={4258-4272}, publisher={IEEE} }

DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation

Abstract

Methodology

Experimental Results

Qualitative comparisons between Lite-Mono and our proposed DCPI-Depth on the DDAD, nuScenes, and Waymo Open datasets. The baseline model performs poorly on these challenging datasets, but demonstrates dramatic performance improvements after being trained under our proposed framework.

BibTeX