SparseD : sparse attention for diffusion language models

Wang, Z; Fang, G; Ma, X; Yang, X; Wang, X

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/119380

DC Field	Value	Language
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.creator	Wang, Z	en_US
dc.creator	Fang, G	en_US
dc.creator	Ma, X	en_US
dc.creator	Yang, X	en_US
dc.creator	Wang, X	en_US
dc.date.accessioned	2026-06-18T03:02:48Z	-
dc.date.available	2026-06-18T03:02:48Z	-
dc.identifier.uri	http://hdl.handle.net/10397/119380	-
dc.description	The Fourteenth International Conference on Learning Representations, Rio de Janeiro, Brazil, 23rd - 27th 2026	en_US
dc.language.iso	en	en_US
dc.publisher	OpenReview.net	en_US
dc.rights	CC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)	en_US
dc.rights	The following publication Wang, Z., Fang, G., Ma, X., Yang, X., & Wang, X. (2025). Sparsed: Sparse attention for diffusion language models. In The Fourteenth International Conference on Learning Representations is available at https://openreview.net/forum?id=dwbrZtYP04.	en_US
dc.title	SparseD : sparse attention for diffusion language models	en_US
dc.type	Conference Paper	en_US
dcterms.abstract	While diffusion language models (DLMs) offer a promising alternative to autoregressive models (ARs), existing open-source DLMs suffer from high inference latency. This bottleneck is mainly due to the attention’s quadratic complexity with respect to context length in computing all query–key pairs. Intuitively, to reduce this complexity, a natural strategy is to restrict attention to sparse patterns that retain only the most relevant connections. Such approaches are well-established in ARs, where attention follows fixed and clearly defined sparse patterns. However, in DLMs, we observe distinct sparsity behaviors: (1) attention patterns vary across heads, (2) attention patterns in each head remain highly similar across denoising steps, and (3) early denoising steps are critical for generation. These findings render sparse attention methods designed for ARs largely incompatible with DLMs, as they fail to capture head-specific structures and risk degrading generation when applied in early denoising steps. To address these challenges, we propose SparseD, a novel sparse attention method for DLMs. Leveraging the observations, SparseD only requires pre-computing head-specific sparse patterns one time, and reuses them across all steps. This prevents recomputing sparse patterns at each denoising step. Meanwhile, SparseD uses full attention in the early steps, then switches to sparse attention later to maintain generation quality. Together, these establish SparseD as a practical and efficient solution for deploying DLMs in long-context applications. Experimental results demonstrate that SparseD achieves lossless acceleration, delivering up to 1.50x speedup over FlashAttention at a 64k context length with 1,024 denoising steps. Code is available at https://github.com/INV-WZQ/SparseD.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	The Fourteenth International Conference on Learning Representations, ICLR 2026, Rio de Janeiro, Brazil, Apr 23rd - 27th 2026, https://openreview.net/forum?id=dwbrZtYP04	en_US
dcterms.issued	2026	-
dc.relation.conference	International Conference on Learning Representations [ICLR]	en_US
dc.description.validate	202606 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a4535a	-
dc.identifier.SubFormID	53065	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This project is supported by the Ministry of Education, Singapore, under its Academic Research Fund Tier 2 (Award Number: MOE-T2EP20122-0006) and the Hong Kong Polytechnic University under the Presidential Young Scholars Scheme (Project ID: P0058232).	en_US
dc.description.pubStatus	Unpublish	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
SparseD_Sparse_Attention.pdf		5.04 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Google ScholarTM

Google Scholar^TM