Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113968
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Industrial and Systems Engineeringen_US
dc.creatorZhang, Men_US
dc.creatorZhang, Yen_US
dc.creatorSun, Jen_US
dc.creatorYung, KLen_US
dc.creatorYang, Len_US
dc.date.accessioned2025-07-04T08:34:59Z-
dc.date.available2025-07-04T08:34:59Z-
dc.identifier.issnAdvanced Intelligent Systemsen_US
dc.identifier.urihttp://hdl.handle.net/10397/113968-
dc.language.isoenen_US
dc.publisherAdvanced intelligent systemsen_US
dc.rights© 2025 The Author(s). Advanced Intelligent Systems published by Wiley-VCH GmbH. This is an open access article under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.en_US
dc.rightsThe following publication Zhang, M., Zhang, Y., Sun, J., Yung, K.-L. and Yang, L. (2025), Geometry-Guided Transformer for Monocular 3D Object Detection. Adv. Intell. Syst. 2500003 is available at https://doi.org/10.1002/aisy.202500003.en_US
dc.subject3D object detectionen_US
dc.subjectDeep learningen_US
dc.subjectSingle-view geometryen_US
dc.subjectTransformeren_US
dc.titleGeometry-guided transformer for monocular 3D object detectionen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.doi10.1002/aisy.202500003en_US
dcterms.abstractMonocular 3D object detection aims to identify objects’ 3D positions and poses with low hardware and computation power costs, which is crucial for scenarios like autonomous driving and deep space exploration. While the corresponding research has developed rapidly with the integration of transformer structures, features in 3D are still simply transformed from visual features, resulting in a mismatch between the detection results and the reality. Moreover, most existing methods suffer from the slow convergence speed. To address these issues in monocular 3D object detection, a framework, named geometry-guided monocular detection with transformer (GG-Mono), is proposed. It consists of three main components: 1) the mix-feature encoder module that incorporates pretrained depth estimation models to enhance convergence speed and accuracy; 2) the geometry encoding module that supplements hybrid encoding with global geometry data; 3) the GG decoder module that utilizes geometry queries to guide the decoding process. Extensive experiments show that the model outperforms all existing methods in terms of detection accuracy, and achieves 26.88% and 30.65% in average precision of 3D detection box (AP3D) on the validation dataset and test dataset, respectively, which is 1.88% and 1.81% higher than the baseline, and significantly improved the convergence speed (from 184 to 90 epochs). These facts prove the advantages of the proposed method for monocular 3D object detection.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationAdvanced intelligent systems, First published: 11 May 2025, Early View, 2500003, https://doi.org/10.1002/aisy.202500003en_US
dcterms.isPartOf2640-4567en_US
dcterms.issued2025-
dc.identifier.scopus2-s2.0-105004687700-
dc.identifier.artn2500003en_US
dc.description.validate202507 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera3820-n02, OA_TA-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe Research Centre for Deep Space Explorations (RCDSE) of the Hong Kong Polytechnic University (projectno. BBDW)en_US
dc.description.fundingTextThe Research Institute for Advanced Manufacturing(RIAM) of the Hong Kong Polytechnic University (project no. CD9F)en_US
dc.description.pubStatusEarly releaseen_US
dc.description.TAWiley (2025)en_US
dc.description.oaCategoryTAen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Zhang_Geometry_Guided_Transformer.pdf2.8 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.