Abstract

Recent advances have markedly improved the cross-scene generalization of relative depth estimation, yet its practical applicability remains limited by the absence of metric scale, local inconsistencies, and low computational efficiency. To address these issues, we present Midas Touch for Depth (MTD), a mathematically interpretable approach that converts relative depth into metric depth using only extremely sparse 3D data. To eliminate local scale inconsistencies, it applies a segment-wise recovery strategy via sparse graph optimization, followed by a pixel-wise refinement strategy using a discontinuity-aware geodesic cost. MTD exhibits strong generalization and achieves substantial accuracy improvements over previous depth completion and depth estimation methods. Moreover, its lightweight, plug-and-play design facilitates deployment and integration on diverse downstream 3D tasks.

MTD Overall Framework

MTD takes relative depth, sparse 3D seeds, and a superpixel segment set as inputs and outputs reliable metric depth. (a) In the overall framework, segment-wise recovery followed by pixel-wise refinement forms a coarse-to-fine pipeline. (b) Per-segment calibration first recovers scale for segments containing projected 3D seeds; we then propagate these calibration parameters to unseeded segments via an optimization on a segment graph. (c) Based on coarse depth, we derive pixel-wise discontinuities to guide the paths of depth propagation. We formulate the geodesic problem as a path-integral optimization and solve it efficiently via dynamic programming, progressing from local updates to a global solution.

Demo video.

Watch on YouTube  ยท  Watch on bilibili

BibTeX

@article{ma2026mtd,
  author    = {Ma, Yu and Guo, Zizhan and Xiong, Zuyi and Zhang, Haoran and Feng, Yi and Zhao, Hongbo and Wang, Hanli and Fan, Rui},
  title     = {The Midas Touch for Metric Depth},
  year      = {2026},
}