Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115534
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorZhang, Yen_US
dc.creatorWang, Yen_US
dc.creatorCui, Yen_US
dc.creatorChau, LPen_US
dc.date.accessioned2025-10-06T03:10:43Z-
dc.date.available2025-10-06T03:10:43Z-
dc.identifier.issn1520-9210en_US
dc.identifier.urihttp://hdl.handle.net/10397/115534-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication Y. Zhang, Y. Wang, Y. Cui and L. -P. Chau, '3DGeoDet: General-Purpose Geometry-Aware Image-Based 3D Object Detection,' in IEEE Transactions on Multimedia, vol. 27, pp. 6235-6247, 2025 is available at https://doi.org/10.1109/TMM.2025.3581780.en_US
dc.subject3D geometryen_US
dc.subjectMonocular 3D object detectionen_US
dc.subjectMulti-view 3D object detectionen_US
dc.subjectVoxel occupancyen_US
dc.title3DGeoDet : general-purpose geometry-aware image-based 3D object detectionen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage6235en_US
dc.identifier.epage6247en_US
dc.identifier.volume27en_US
dc.identifier.doi10.1109/TMM.2025.3581780en_US
dcterms.abstractThis paper proposes 3DGeoDet, a novel geometry-aware 3D object detection approach that effectively handles single- and multi-view RGB images in indoor and outdoor environments, showcasing its general-purpose applicability. The key challenge for image-based 3D object detection tasks is the lack of 3D geometric cues, which leads to ambiguity in establishing correspondences between images and 3D representations. To tackle this problem, 3DGeoDet generates efficient 3D geometric representations in both explicit and implicit manners based on predicted depth information. Specifically, we utilize the predicted depth to learn voxel occupancy and optimize the voxelized 3D feature volume explicitly through the proposed voxel occupancy attention. To further enhance 3D awareness, the feature volume is integrated with an implicit 3D representation, the truncated signed distance function (TSDF). Without requiring supervision from 3D signals, we significantly improve the model’s comprehension of 3D geometry by leveraging intermediate 3D representations and achieve end-to-end training. Our approach surpasses the performance of state-of-the-art image-based methods on both single- and multi-view benchmark datasets across diverse environments, achieving a 9.3 mAP@0.5 improvement on the SUN RGB-D dataset, a 3.3 mAP@0.5 improvement on the ScanNetV2 dataset, and a 0.19 AP3D@0.7 improvement on the KITTI dataset.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE transactions on multimedia, 2025, v. 27, p. 6235-6247en_US
dcterms.isPartOfIEEE transactions on multimediaen_US
dcterms.issued2025-
dc.identifier.scopus2-s2.0-105008802041-
dc.identifier.eissn1941-0077en_US
dc.description.validate202510 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.SubFormIDG000212/2025-07-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis work was conducted at the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust.en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AAM)en_US
dc.relation.rdatahttps://cindy0725.github.io/3DGeoDet/en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Zhang_3DGeoDet_General_Purpose.pdfPre-Published version3.52 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.