Leveraging pretrained diffusion model for semantic 3-D reconstruction from monocular remote sensing image

Xu, X; Deng, R; Cao, Q; Guo, Z; Chen, Y; Yan, J

doi:10.1109/TGRS.2026.3653117

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119106

DC Field	Value	Language
dc.contributor	Department of Building Environment and Energy Engineering	en_US
dc.contributor	International Centre of Urban Energy Nexus	en_US
dc.contributor	Research Institute for Smart Energy	en_US
dc.creator	Xu, X	en_US
dc.creator	Deng, R	en_US
dc.creator	Cao, Q	en_US
dc.creator	Guo, Z	en_US
dc.creator	Chen, Y	en_US
dc.creator	Yan, J	en_US
dc.date.accessioned	2026-06-03T08:48:26Z	-
dc.date.available	2026-06-03T08:48:26Z	-
dc.identifier.issn	0196-2892	en_US
dc.identifier.uri	http://hdl.handle.net/10397/119106	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication X. Xu, R. Deng, Q. Cao, Z. Guo, Y. Chen and J. Yan, 'Leveraging Pretrained Diffusion Model for Semantic 3-D Reconstruction From Monocular Remote Sensing Image,' in IEEE Transactions on Geoscience and Remote Sensing, vol. 64, pp. 1-16, 2026, Art no. 5603516 is available at https://doi.org/10.1109/TGRS.2026.3653117.	en_US
dc.subject	Low-rank adaptation (LoRA)	en_US
dc.subject	Pretrained diffusion model (PDM)	en_US
dc.subject	Semantic 3-D reconstruction	en_US
dc.subject	Task adaptation	en_US
dc.subject	Visual foundation models	en_US
dc.title	Leveraging pretrained diffusion model for semantic 3-D reconstruction from monocular remote sensing image	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	64	en_US
dc.identifier.doi	10.1109/TGRS.2026.3653117	en_US
dcterms.abstract	Semantic 3D reconstruction from monocular imagery serves as a cost-effective tool for many urban applications, such as energy system modeling, resilience analysis, and urban planning. However, the generalization of task-specific models for semantic 3D reconstruction remains limited by the available data scale and diversity. In contrast, visual foundation models (VFMs) are trained on large-scale, diverse datasets, enabling stronger adaptability and richer visual knowledge across different tasks. Unlike most VFMs that focus on discrimination or feature extraction, pretrained diffusion models (PDMs) are generative, combining high-level semantic understanding with the ability to produce high-fidelity details and textures. Building upon these advantages, this study proposes a novel task-adaptive framework that harnesses PDMs for semantic 3D reconstruction from monocular remote sensing images. Our framework employs low-rank adaptation to efficiently fine-tune the denoising network, effectively modeling the high-dimensional features required for semantic 3D reconstruction while only training a minimal fraction of parameters. We further design a lightweight, task-specific decoder to map these features into target elevation and semantic maps. In addition, we introduce an evidential height regression method, which incorporates uncertainty awareness into height estimation without introducing additional computational overhead. Experiments on the public US3D JAX and Open Data DC datasets demonstrate that our framework significantly outperforms other existing methods in both subtasks of height estimation and semantic segmentation, achieving high-fidelity semantic 3D reconstruction of remote sensing scenes. This technology holds significant potential for advancing urban modeling, enabling more accurate and efficient large-scale geographic analysis.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE transactions on geoscience and remote sensing, 2026, v. 64, 5603516	en_US
dcterms.isPartOf	IEEE transactions on geoscience and remote sensing	en_US
dcterms.issued	2026	-
dc.identifier.scopus	2-s2.0-105027545682	-
dc.identifier.eissn	1558-0644	en_US
dc.identifier.artn	5603516	en_US
dc.description.validate	202606 bcjz	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.SubFormID	G001752/2026-02	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported in part by the International Center of Urban Energy Nexus under Project P0047700; in part by Research Institute for Sustainable Urban Development (RISUD): Cutting-Edge Solar Synergies Integrated with 3-D Urban Environments toward a Carbon-Neutral City under Project P0052733; in part by Ministry of Science and Technology (MOST) National Key Research and Development Program: Urban Photovoltaic System Planning Method Considering Carbon Footprint and Environmental Benefits under Project P0052743; in part by the Research Institute for Climate-Resilient Infrastructure (RICRI)-Intelligent Platform and Toolbox for Urban Infrastructure Resilience (IPT4U): Intelligent Platform and Toolbox for Urban Infrastructure Resilience under Project P0056532; in part by the High Performance Computing Centers at Ningbo Institute of Digital Twin, Ningbo.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Xu_Leveraging_Pretrained_Diffusion.pdf	Pre-Published version	30.65 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM