Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model

Zheng, H; Huang, X

doi:10.26599/CVM.2025.9450511

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117401

DC Field	Value	Language
dc.contributor	Department of Building Environment and Energy Engineering	en_US
dc.creator	Zheng, H	en_US
dc.creator	Huang, X	en_US
dc.date.accessioned	2026-02-23T03:56:00Z	-
dc.date.available	2026-02-23T03:56:00Z	-
dc.identifier.issn	2096-0433	en_US
dc.identifier.uri	http://hdl.handle.net/10397/117401	-
dc.language.iso	en	en_US
dc.publisher	Tsinghua University Press	en_US
dc.rights	© The Author(s) 2026.	en_US
dc.rights	Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.	en_US
dc.rights	The following publication H. Zheng and X. Huang, "Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model," in Computational Visual Media is available at https://doi.org/10.26599/CVM.2025.9450511.	en_US
dc.subject	Diffusion models	en_US
dc.subject	Fire	en_US
dc.subject	Physicality	en_US
dc.subject	Text-to-Video (T2V)	en_US
dc.subject	Video	en_US
dc.title	Photorealistic fire scene video generation via multimodal large language model and pre-trained video diffusion model	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.doi	10.26599/CVM.2025.9450511	en_US
dcterms.abstract	Text-to-video diffusion models have made significant progress. However, there is still a lack of dedicated research on generating fire scene videos with physical realism and visual fidelity. To address this gap, we propose text-to-video fire (T2VFire) scene generation. T2VFire uses GPT-4o as the core engine, which is integrated with an external fire-related knowledge base and a retrieval-augmented generation (RAG) mechanism that can be dynamically updated based on prompts. With the support of this knowledge, the system first expands the user's initial text description and generates a keyframe image. Then, through iterative prompt optimization, it guides a pretrained video diffusion model to generate fire scene videos with physical consistency. Experimental results show that T2VFire improves upon the physical consistency and visual realism of fire scene videos generated by current video generation models. This method provides a solid foundation for future smart firefighting and digital twin systems in building fire safety management.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Computational visual media, Date of Publication: 27 January 2026, Early Access, https://doi.org/10.26599/CVM.2025.9450511	en_US
dcterms.isPartOf	Computational visual media	en_US
dcterms.issued	2026	-
dc.identifier.eissn	2096-0662	en_US
dc.description.validate	202602 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a4316	-
dc.identifier.SubFormID	52581	-
dc.description.fundingSource	RGC	en_US
dc.description.pubStatus	Early release	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Zheng_Photorealistic_Fire_Scene.pdf		20.9 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM