Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/116349
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Electrical and Electronic Engineering | - |
| dc.creator | Meng, S | - |
| dc.creator | Wang, Y | - |
| dc.creator | Xu, H | - |
| dc.creator | Chau, LP | - |
| dc.date.accessioned | 2025-12-18T06:42:05Z | - |
| dc.date.available | 2025-12-18T06:42:05Z | - |
| dc.identifier.issn | 1566-2535 | - |
| dc.identifier.uri | http://hdl.handle.net/10397/116349 | - |
| dc.language.iso | en | en_US |
| dc.publisher | Elsevier | en_US |
| dc.subject | Contrastive learning | en_US |
| dc.subject | Cross-modality | en_US |
| dc.subject | Descriptor representation | en_US |
| dc.subject | Place recognition | en_US |
| dc.title | Contrastive learning-based place descriptor representation for cross-modality place recognition | en_US |
| dc.type | Journal/Magazine Article | en_US |
| dc.identifier.volume | 124 | - |
| dc.identifier.doi | 10.1016/j.inffus.2025.103351 | - |
| dcterms.abstract | Place recognition in LiDAR maps plays a vital role in assisting localization, especially in GPS-denied circumstances. While many efforts have been made toward pure LiDAR-based place recognition, these approaches are often hindered by high computational costs and operational burden on the driving agent. To alleviate these limitations, we explore an alternative approach for large-scale cross-modal localization by matching real-time RGB images to pre-existing LiDAR 3D point cloud maps. Specifically, we present a unified place descriptor representation learning method for cross modalities using Siamese architecture, which reformulates place recognition as a similarity modeling retrieval task. To address the inherent modality differences between visual images and point clouds, we first transform unordered point clouds into a range-view representation, facilitating effective cross-modal metric learning. Subsequently, we introduce a Transformer-Mamba Mixer module that integrates selective scanning and attention mechanisms to capture both intra-context and inter-context embeddings, enabling the generation of place descriptors. To further enrich and generate global location descriptors, we propose a semantic-promoted descriptor enhancer grounded in semantic distribution estimation. Finally, a contrastive learning paradigm is employed to perform cross-modal place recognition, identifying the most similar descriptors across modalities. Extensive experiments demonstrate the superiority of our proposed method in comparison to state-of-the-art methods. The details are available at https://github.com/emilyemliyM/Cross-PRNet. | - |
| dcterms.accessRights | embargoed access | en_US |
| dcterms.bibliographicCitation | Information fusion, Dec. 2025, v. 124, 103351 | - |
| dcterms.isPartOf | Information fusion | - |
| dcterms.issued | 2025-12 | - |
| dc.identifier.scopus | 2-s2.0-105007426191 | - |
| dc.identifier.eissn | 1872-6305 | - |
| dc.identifier.artn | 103351 | - |
| dc.description.validate | 202512 bcjz | - |
| dc.description.oa | Not applicable | en_US |
| dc.identifier.SubFormID | G000429/2025-11 | en_US |
| dc.description.fundingSource | RGC | en_US |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | The research work was conducted in the JC STEM Lab of Machine Learning and Computer Vision funded by The Hong Kong Jockey Club Charities Trust. It was partially supported by the Research Grants Council of the Hong Kong SAR, China (Project No. PolyU 15215824). | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.date.embargo | 2027-12-31 | en_US |
| dc.description.oaCategory | Green (AAM) | en_US |
| Appears in Collections: | Journal/Magazine Article | |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



