MetaLA : unified optimal linear approximation to softmax attention map

Chou, Y; Yao, M; Wang, K; Pan, Y; Zhu, RJ; Wu, J; Zhong, Y; Qiao, Y; Xu, B; Li, G

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113671

DC Field	Value	Language
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.creator	Chou, Y	en_US
dc.creator	Yao, M	en_US
dc.creator	Wang, K	en_US
dc.creator	Pan, Y	en_US
dc.creator	Zhu, RJ	en_US
dc.creator	Wu, J	en_US
dc.creator	Zhong, Y	en_US
dc.creator	Qiao, Y	en_US
dc.creator	Xu, B	en_US
dc.creator	Li, G	en_US
dc.date.accessioned	2025-06-17T07:40:46Z	-
dc.date.available	2025-06-17T07:40:46Z	-
dc.identifier.isbn	979-8-3313-1438-5	en_US
dc.identifier.issn	1049-5258	en_US
dc.identifier.uri	http://hdl.handle.net/10397/113671	-
dc.description	38th Conference on Neural Information Processing Systems (NeurIPS 2024), 10-15 December 2024, Vancouver, Canada	en_US
dc.language.iso	en	en_US
dc.publisher	Neural Information Processing Systems Foundation, Inc. (NeurIPS)	en_US
dc.rights	Posted with permission of the author.	en_US
dc.rights	The following publication Chou, Y., Yao, M., Wang, K., Pan, Y., Zhu, R. J., Wu, J., ... & Li, G. (2024). MetaLA: Unified optimal linear approximation to softmax attention map. Advances in Neural Information Processing Systems, 37 is available at https://papers.nips.cc/paper_files/paper/2024/hash/8329a45669017898bb0cc09d27f8d2bb-Abstract-Conference.html.	en_US
dc.title	MetaLA : unified optimal linear approximation to softmax attention map	en_US
dc.type	Conference Paper	en_US
dc.identifier.volume	37	en_US
dcterms.abstract	Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Advances in neural information processing systems, 2024, v. 37, https://nips.cc/virtual/2024/poster/94714	en_US
dcterms.isPartOf	Advances in neural information processing systems	en_US
dcterms.issued	2024	-
dc.relation.conference	Conference on Neural Information Processing Systems [NeurIPS]	en_US
dc.description.validate	202506 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a3717c	-
dc.identifier.SubFormID	50834	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	CAS Project for Young Scientists in Basic Research; National Distinguished Young Scholars; National Natural Science Foundation of China; Beijing Science and Technology Plan; Beijing Natural Science Foundation for Distinguished Young Scholars; China Postdoctoral Science Foundation; CAAIMindSpore Open Fund	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Copyright retained by author	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
575_MetaLA_Unified_Optimal_Lin.pdf	Pre-Published version	1.13 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Google Scholar^TM