D2NT: A High-Performing Depth-to-Normal Translator
Yi Feng
Bohuan Xue
Ming Liu
Qijun Chen
Rui Fan
[Paper]
[GitHub]

Abstract

Surface normal holds significant importance in visual environmental perception, serving as a source of rich geometric information. However, the state-of-the-art (SoTA) surface normal estimators (SNEs) generally suffer from an unsatisfactory trade-off between efficiency and accuracy. To resolve this dilemma, this paper first presents a superfast depth-to-normal translator (D2NT), which can directly translate depth images into surface normal maps without calculating 3D coordinates. We then propose a discontinuity-aware gradient (DAG) filter, which adaptively generates gradient convolution kernels to improve depth gradient estimation. Finally, we propose a surface normal refinement module that can easily be integrated into any depth-to-normal SNEs, substantially improving the surface normal estimation accuracy. Our proposed algorithm demonstrates the best accuracy among all other existing real-time SNEs and achieves the SoTA trade-off between efficiency and accuracy.

Methodology

The illustration of our proposed D2NT, DAG filter, and MNR module. D2NT translates depth images into surface normal maps in an end-to-end fashion; DAG filter adaptively generates smoothness-guided direction weights for improved depth gradient estimation in and around discontinuities; MNR module further refines the estimated surface normals based on the smoothness of neighboring pixels.

Experimental Results

Our proposed D2NT series demonstrate superior performance compared to all other SoTA SNEs. D2NT achieves the highest computational efficiency and the best trade-off between speed and accuracy, while D2NT V3 achieves the highest accuracy among all real-time SOTA SNEs.
Speed, accuracy, and trade-off comparisons among SoTA geometry-based snes on the 3F2N dataset.
Comparison of our proposed SNE with other SoTA geometry-based SNEs on the 3F2N dataset: (a) depth maps and ground-truth surface normal maps; (b) error maps obtained using 3F2N (median filter); (c) error maps obtained using CP2TV; (d) error maps obtained using our proposed D2NT V3.


Video


[Bilibili Video Link]
[Youtube Video Link]

Citation

@inproceedings
{
feng2023d2nt,
title={D2NT: A High-Performing Depth-to-Normal Translator},
author={Feng, Yi, Xue, Bohuan, Liu, Ming, Chen, Qijun and Fan, Rui},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2023}
}


Acknowledgements

This work was supported by the National Key R&D Program of China under Grant 2020AAA0108100, the National Natural Science Foundation of China under Grant 62233013, the Science and Technology Commission of Shanghai Municipal under Grant 22511104500, the Fundamental Research Funds for the Central Universities under Grants 22120220184 and 22120220214, and the Shanghai Municipal Science and Technology Major Project under Grant 2021SHZDZX0100.