Panoptic occupancy prediction aims to jointly infer voxel-wise semantics and instance identities within a unified 3D scene representation. Nevertheless, progress in this field remains constrained by the absence of high-quality 3D mesh resources, instance-level annotations, and physically consistent occupancy datasets. Existing benchmarks typically provide incomplete and low-resolution geometry without instance-level annotations, limiting the development of models capable of achieving precise geometric reconstruction, reliable occlusion reasoning, and holistic 3D understanding. To address these challenges, this paper presents an instance-centric benchmark for the 3D panoptic occupancy prediction task. Specifically, we introduce ADMesh, the first unified 3D mesh library tailored for autonomous driving, which integrates over 15K high-quality 3D models with diverse textures and rich semantic annotations. Building upon ADMesh, we further construct CarlaOcc, a large-scale, physically consistent panoptic occupancy dataset generated using the CARLA simulator. This dataset contains 100K frames with fine-grained, instance-level occupancy ground truth at voxel resolutions as fine as 0.05 m. Furthermore, standardized evaluation metrics are introduced to quantify the quality of existing occupancy datasets. Finally, a systematic benchmark of representative models is established on the proposed dataset, which provides a unified platform for fair comparison and reproducible research in the field of 3D panoptic perception.
Overview of the proposed ADMesh library and the CarlaOcc generation pipeline. The ADMesh library is constructed by extracting and organizing diverse 3D assets from multiple sources, which are subsequently used to reconstruct dynamic scenes with both static structures and temporally aligned non-rigid motions. The resulting unified scene meshes are then used to rectify sensor artifacts and further processed with a topology-aware mesh permutation strategy to produce non-overlapping panoptic occupancy labels.
Qualitative comparison of occupancy ground truth across different public datasets. Our gt provides instance-level annotations with much finer occupancy resolution and fewer broken boundaries, demonstrating superior physical consistency and spatiotemporal continuity.
Comparison of occupancy ground truth in different voxel sizes. Our mesh-based occupancy generation pipeline supports arbitrary voxel resolutions as fine as 0.05 m.
Car
Bus
Truck
Motorcycle
Pedestrian
Bus Stop
Building
Tree
Chair
@InProceedings{feng2026carlaocc,
title = {An Instance-Centric Panoptic Occupancy Prediction Benchmark for Autonomous Driving},
author = {Yi Feng, Junwu E, Zizhan Guo, Yu Ma, Hanli Wang, Rui Fan},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2026}
}