Collision-free space detection is of utmost importance for autonomous robot perception and navigation.
State-of-the-art (SoTA) approaches generally extract features from RGB images and an additional source
or modality of 3-D information, such as depth or disparity images, using a pair of independent encoders.
The extracted features are subsequently fused and decoded to yield semantic predictions of collision-free
spaces. Such feature-fusion approaches become infeasible in scenarios, where the sensor for 3-D information
acquisition is unavailable, or just when multi-sensor calibration falls short of the necessary precision.
To overcome these limitations, this paper introduces a novel end-to-end collision-free space detection
network, referred to as SG-RoadSeg, built upon our previous work SNE-RoadSeg. A key contribution of this
paper is a strategy for sharing encoder representations that are co-learned through both semantic
segmentation and unsupervised stereo matching tasks, enabling the features extracted from RGB images
to contain both semantic and spatial geometric information. The unsupervised deep stereo serves as an
auxiliary functionality, capable of generating accurate disparity maps that can be used by other perception
tasks that require depth-related data. Comprehensive experimental results on the KITTI road and semantics
datasets validate the effectiveness of our proposed architecture and encoder representation sharing strategy.
SG-RoadSeg also demonstrates superior performance than other SoTA collision-free space detection approaches.
Our source code, demo video, and supplement are publicly available at
https://mias.group/SG-RoadSeg/.
|