Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117401
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Building Environment and Energy Engineeringen_US
dc.creatorZheng, Hen_US
dc.creatorHuang, Xen_US
dc.date.accessioned2026-02-23T03:56:00Z-
dc.date.available2026-02-23T03:56:00Z-
dc.identifier.issn2096-0433en_US
dc.identifier.urihttp://hdl.handle.net/10397/117401-
dc.language.isoenen_US
dc.publisherTsinghua University Pressen_US
dc.rights© The Author(s) 2026.en_US
dc.rightsOpen Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.en_US
dc.rightsThe following publication H. Zheng and X. Huang, "Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model," in Computational Visual Media is available at https://doi.org/10.26599/CVM.2025.9450511.en_US
dc.subjectDiffusion modelsen_US
dc.subjectFireen_US
dc.subjectPhysicalityen_US
dc.subjectText-to-Video (T2V)en_US
dc.subjectVideoen_US
dc.titlePhotorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion modelen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.doi10.26599/CVM.2025.9450511en_US
dcterms.abstractText-to-video diffusion models have made significant progress. However, there is still a lack of dedicated research on generating fire scene videos with physical realism and visual fidelity. To address this gap, we propose text-to-video fire (T2VFire) scene generation. T2VFire uses GPT-4o as the core engine, which is integrated with an external fire-related knowledge base and a retrieval-augmented generation (RAG) mechanism that can be dynamically updated based on prompts. With the support of this knowledge, the system first expands the user's initial text description and generates a keyframe image. Then, through iterative prompt optimization, it guides a pretrained video diffusion model to generate fire scene videos with physical consistency. Experimental results show that T2VFire improves upon the physical consistency and visual realism of fire scene videos generated by current video generation models. This method provides a solid foundation for future smart firefighting and digital twin systems in building fire safety management.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationComputational visual media, Date of Publication: 27 January 2026, Early Access, https://doi.org/10.26599/CVM.2025.9450511en_US
dcterms.isPartOfComputational visual mediaen_US
dcterms.issued2026-
dc.identifier.eissn2096-0662en_US
dc.description.validate202602 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera4316-
dc.identifier.SubFormID52581-
dc.description.fundingSourceRGCen_US
dc.description.pubStatusEarly releaseen_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Zheng_Photorealistic_Fire_Scene.pdf20.9 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.