An Instance-Centric Panoptic Occupancy Prediction Benchmark for Autonomous Driving

Abstract

Panoptic occupancy prediction aims to jointly infer voxel-wise semantics and instance identities within a unified 3D scene representation. Nevertheless, progress in this field remains constrained by the absence of high-quality 3D mesh resources, instance-level annotations, and physically consistent occupancy datasets. Existing benchmarks typically provide incomplete and low-resolution geometry without instance-level annotations, limiting the development of models capable of achieving precise geometric reconstruction, reliable occlusion reasoning, and holistic 3D understanding. To address these challenges, this paper presents an instance-centric benchmark for the 3D panoptic occupancy prediction task. Specifically, we introduce ADMesh, the first unified 3D mesh library tailored for autonomous driving, which integrates over 15K high-quality 3D models with diverse textures and rich semantic annotations. Building upon ADMesh, we further construct CarlaOcc, a large-scale, physically consistent panoptic occupancy dataset generated using the CARLA simulator. This dataset contains 100K frames with fine-grained, instance-level occupancy ground truth at voxel resolutions as fine as 0.05 m. Furthermore, standardized evaluation metrics are introduced to quantify the quality of existing occupancy datasets. Finally, a systematic benchmark of representative models is established on the proposed dataset, which provides a unified platform for fair comparison and reproducible research in the field of 3D panoptic perception.

Methodology

Overview of the proposed ADMesh library and the CarlaOcc generation pipeline. The ADMesh library is constructed by extracting and organizing diverse 3D assets from multiple sources, which are subsequently used to reconstruct dynamic scenes with both static structures and temporally aligned non-rigid motions. The resulting unified scene meshes are then used to rectify sensor artifacts and further processed with a topology-aware mesh permutation strategy to produce non-overlapping panoptic occupancy labels.

Occupancy Ground Truth Quality Comparison

Qualitative comparison of occupancy ground truth across different public datasets. Our gt provides instance-level annotations with much finer occupancy resolution and fewer broken boundaries, demonstrating superior physical consistency and spatiotemporal continuity.

Occupancy Ground Truth Comparison in Differnet Voxel Sizes

Comparison of occupancy ground truth in different voxel sizes. Our mesh-based occupancy generation pipeline supports arbitrary voxel resolutions as fine as 0.05 m.

Scene Reconstruction

Town 01

Rendered RGB Panoptic Occupancy

Town 10

Rendered RGB Panoptic Occupancy

Assets

Car

Bus

Truck

Motorcycle

Pedestrian

Bus Stop

Building

Tree

Chair

Sensor Artifact Rectification

Town04 Seq09

Before Rectification After Rectification

Town10 Seq07