Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/116349
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorMeng, S-
dc.creatorWang, Y-
dc.creatorXu, H-
dc.creatorChau, LP-
dc.date.accessioned2025-12-18T06:42:05Z-
dc.date.available2025-12-18T06:42:05Z-
dc.identifier.issn1566-2535-
dc.identifier.urihttp://hdl.handle.net/10397/116349-
dc.language.isoenen_US
dc.publisherElsevieren_US
dc.subjectContrastive learningen_US
dc.subjectCross-modalityen_US
dc.subjectDescriptor representationen_US
dc.subjectPlace recognitionen_US
dc.titleContrastive learning-based place descriptor representation for cross-modality place recognitionen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume124-
dc.identifier.doi10.1016/j.inffus.2025.103351-
dcterms.abstractPlace recognition in LiDAR maps plays a vital role in assisting localization, especially in GPS-denied circumstances. While many efforts have been made toward pure LiDAR-based place recognition, these approaches are often hindered by high computational costs and operational burden on the driving agent. To alleviate these limitations, we explore an alternative approach for large-scale cross-modal localization by matching real-time RGB images to pre-existing LiDAR 3D point cloud maps. Specifically, we present a unified place descriptor representation learning method for cross modalities using Siamese architecture, which reformulates place recognition as a similarity modeling retrieval task. To address the inherent modality differences between visual images and point clouds, we first transform unordered point clouds into a range-view representation, facilitating effective cross-modal metric learning. Subsequently, we introduce a Transformer-Mamba Mixer module that integrates selective scanning and attention mechanisms to capture both intra-context and inter-context embeddings, enabling the generation of place descriptors. To further enrich and generate global location descriptors, we propose a semantic-promoted descriptor enhancer grounded in semantic distribution estimation. Finally, a contrastive learning paradigm is employed to perform cross-modal place recognition, identifying the most similar descriptors across modalities. Extensive experiments demonstrate the superiority of our proposed method in comparison to state-of-the-art methods. The details are available at https://github.com/emilyemliyM/Cross-PRNet.-
dcterms.accessRightsembargoed accessen_US
dcterms.bibliographicCitationInformation fusion, Dec. 2025, v. 124, 103351-
dcterms.isPartOfInformation fusion-
dcterms.issued2025-12-
dc.identifier.scopus2-s2.0-105007426191-
dc.identifier.eissn1872-6305-
dc.identifier.artn103351-
dc.description.validate202512 bcjz-
dc.description.oaNot applicableen_US
dc.identifier.SubFormIDG000429/2025-11en_US
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust. It was partially supported by the Research Grants Council of the Hong Kong SAR, China (Project No. PolyU 15215824).en_US
dc.description.pubStatusPublisheden_US
dc.date.embargo2027-12-31en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Open Access Information
Status embargoed access
Embargo End Date 2027-12-31
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.