Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115534
PIRA download icon_1.1View/Download Full Text
Title: 3DGeoDet : general-purpose geometry-aware image-based 3D object detection
Authors: Zhang, Y 
Wang, Y 
Cui, Y 
Chau, LP 
Issue Date: 2025
Source: IEEE transactions on multimedia, 2025, v. 27, p. 6235-6247
Abstract: This paper proposes 3DGeoDet, a novel geometry-aware 3D object detection approach that effectively handles single- and multi-view RGB images in indoor and outdoor environments, showcasing its general-purpose applicability. The key challenge for image-based 3D object detection tasks is the lack of 3D geometric cues, which leads to ambiguity in establishing correspondences between images and 3D representations. To tackle this problem, 3DGeoDet generates efficient 3D geometric representations in both explicit and implicit manners based on predicted depth information. Specifically, we utilize the predicted depth to learn voxel occupancy and optimize the voxelized 3D feature volume explicitly through the proposed voxel occupancy attention. To further enhance 3D awareness, the feature volume is integrated with an implicit 3D representation, the truncated signed distance function (TSDF). Without requiring supervision from 3D signals, we significantly improve the model’s comprehension of 3D geometry by leveraging intermediate 3D representations and achieve end-to-end training. Our approach surpasses the performance of state-of-the-art image-based methods on both single- and multi-view benchmark datasets across diverse environments, achieving a 9.3 mAP@0.5 improvement on the SUN RGB-D dataset, a 3.3 mAP@0.5 improvement on the ScanNetV2 dataset, and a 0.19 AP3D@0.7 improvement on the KITTI dataset.
Keywords: 3D geometry
Monocular 3D object detection
Multi-view 3D object detection
Voxel occupancy
Publisher: Institute of Electrical and Electronics Engineers
Journal: IEEE transactions on multimedia 
ISSN: 1520-9210
EISSN: 1941-0077
DOI: 10.1109/TMM.2025.3581780
Research Data: https://cindy0725.github.io/3DGeoDet/
Rights: © 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
The following publication Y. Zhang, Y. Wang, Y. Cui and L. -P. Chau, '3DGeoDet: General-Purpose Geometry-Aware Image-Based 3D Object Detection,' in IEEE Transactions on Multimedia, vol. 27, pp. 6235-6247, 2025 is available at https://doi.org/10.1109/TMM.2025.3581780.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
Zhang_3DGeoDet_General_Purpose.pdfPre-Published version3.52 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.