Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119322
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Data Science and Artificial Intelligenceen_US
dc.creatorWang, Zen_US
dc.creatorZheng, Ben_US
dc.creatorYang, Xen_US
dc.creatorTan, Zen_US
dc.creatorXu, Yen_US
dc.creatorWang, Xen_US
dc.date.accessioned2026-06-15T09:02:01Z-
dc.date.available2026-06-15T09:02:01Z-
dc.identifier.isbn1-57735-906-2en_US
dc.identifier.isbn978-1-57735-906-7en_US
dc.identifier.urihttp://hdl.handle.net/10397/119322-
dc.descriptionThe 40th AAAI Conference on Artificial Intelligence, January 20-27, 2026, Singaporeen_US
dc.language.isoenen_US
dc.publisherAAAI Pressen_US
dc.rightsCopyright © 2026, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.en_US
dc.rightsThe following publication Wang, Z., Zheng, B., Yang, X., Tan, Z., Xu, Y., & Wang, X. (2026). Minute-Long Videos with Dual Parallelisms. Proceedings of the AAAI Conference on Artificial Intelligence, 40(12), 10358–10366 is available at https://dx.doi.org/10.1609/aaai.v40i12.38006.en_US
dc.titleMinute-long videos with dual parallelismsen_US
dc.typeConference Paperen_US
dc.identifier.spage10358en_US
dc.identifier.epage10366en_US
dc.identifier.volume40en_US
dc.identifier.issue12en_US
dc.identifier.doi10.1609/aaai.v40i12.38006en_US
dcterms.abstractDiffusion Transformer (DiT)-based video diffusion models generate high-quality videos at scale but incur prohibitive processing latency and memory costs for long videos. To address this, we propose a novel distributed inference strategy, termed DualParal. The core idea is that, instead of generating an entire video on a single GPU, we parallelize computation by partitioning both video frames and model layers across multiple GPUs. However, a naive parallel implementation is not feasible. Because all frames need to share the same noise level, they can't be processed independently. Instead, every step must wait for all others to finish, which cancels out the speed benefits of parallel processing. We overcome this obstacle with a block-wise denoising scheme. Namely, we segment the video into sequential blocks, each with a different noise level. As a result, we process them in a pipeline across the GPUs. Each GPU, holding a subset of the model layers, processes a specific block of frames and passes the results to the next GPU, enabling asynchronous computation and communication. To further optimize performance, we incorporate two key enhancements. Firstly, each GPU uses a feature cache technique to reduce the overhead of smooth transitions by reusing only features involved in cross-frame computation from the prior block, minimizing inter-GPU communication and redundant computation. Secondly, we employ a coordinated noise initialization strategy, ensuring globally consistent temporal dynamics by sharing initial noise patterns across GPUs. Together, these enable fast, artifact-free, and infinitely long video generation. Applied to the latest diffusion transformer video generator, our method efficiently produces 1,025-frame videos with up to 6.54x lower latency and 1.48x lower memory cost on 8xRTX 4090 GPUs.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIn S Koenig, C Jenkins, & ME Taylor (Eds.), Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence, p. 10358-10366. Washington, DC: Association for the Advancement of Artificial Intelligence, 2026en_US
dcterms.issued2026-
dc.relation.ispartofbookProceedings of the 40th Annual AAAI Conference on Artificial Intelligenceen_US
dc.relation.conferenceConference on Artificial Intelligence [AAAI]en_US
dc.publisher.placeWashington, DCen_US
dc.description.validate202606 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumbera4498-
dc.identifier.SubFormID52971-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThis project is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Award Number: MOE-T2EP20122-0006), and by the Presidential Young Scholars Scheme (Project ID: IDP0058232) from The Hong Kong Polytechnic University.en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryPublisher permissionen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
Wang_Minute_Long_Videos.pdf6.25 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.