Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/117401
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Building Environment and Energy Engineering | en_US |
| dc.creator | Zheng, H | en_US |
| dc.creator | Huang, X | en_US |
| dc.date.accessioned | 2026-02-23T03:56:00Z | - |
| dc.date.available | 2026-02-23T03:56:00Z | - |
| dc.identifier.issn | 2096-0433 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10397/117401 | - |
| dc.language.iso | en | en_US |
| dc.publisher | Tsinghua University Press | en_US |
| dc.rights | © The Author(s) 2026. | en_US |
| dc.rights | Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. | en_US |
| dc.rights | The following publication H. Zheng and X. Huang, "Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model," in Computational Visual Media is available at https://doi.org/10.26599/CVM.2025.9450511. | en_US |
| dc.subject | Diffusion models | en_US |
| dc.subject | Fire | en_US |
| dc.subject | Physicality | en_US |
| dc.subject | Text-to-Video (T2V) | en_US |
| dc.subject | Video | en_US |
| dc.title | Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model | en_US |
| dc.type | Journal/Magazine Article | en_US |
| dc.identifier.doi | 10.26599/CVM.2025.9450511 | en_US |
| dcterms.abstract | Text-to-video diffusion models have made significant progress. However, there is still a lack of dedicated research on generating fire scene videos with physical realism and visual fidelity. To address this gap, we propose text-to-video fire (T2VFire) scene generation. T2VFire uses GPT-4o as the core engine, which is integrated with an external fire-related knowledge base and a retrieval-augmented generation (RAG) mechanism that can be dynamically updated based on prompts. With the support of this knowledge, the system first expands the user's initial text description and generates a keyframe image. Then, through iterative prompt optimization, it guides a pretrained video diffusion model to generate fire scene videos with physical consistency. Experimental results show that T2VFire improves upon the physical consistency and visual realism of fire scene videos generated by current video generation models. This method provides a solid foundation for future smart firefighting and digital twin systems in building fire safety management. | en_US |
| dcterms.accessRights | open access | en_US |
| dcterms.bibliographicCitation | Computational visual media, Date of Publication: 27 January 2026, Early Access, https://doi.org/10.26599/CVM.2025.9450511 | en_US |
| dcterms.isPartOf | Computational visual media | en_US |
| dcterms.issued | 2026 | - |
| dc.identifier.eissn | 2096-0662 | en_US |
| dc.description.validate | 202602 bcch | en_US |
| dc.description.oa | Version of Record | en_US |
| dc.identifier.FolderNumber | a4316 | - |
| dc.identifier.SubFormID | 52581 | - |
| dc.description.fundingSource | RGC | en_US |
| dc.description.pubStatus | Early release | en_US |
| dc.description.oaCategory | CC | en_US |
| Appears in Collections: | Journal/Magazine Article | |
Files in This Item:
| File | Description | Size | Format | |
|---|---|---|---|---|
| Zheng_Photorealistic_Fire_Scene.pdf | 20.9 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



