Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/113671
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Data Science and Artificial Intelligence | en_US |
dc.creator | Chou, Y | en_US |
dc.creator | Yao, M | en_US |
dc.creator | Wang, K | en_US |
dc.creator | Pan, Y | en_US |
dc.creator | Zhu, RJ | en_US |
dc.creator | Wu, J | en_US |
dc.creator | Zhong, Y | en_US |
dc.creator | Qiao, Y | en_US |
dc.creator | Xu, B | en_US |
dc.creator | Li, G | en_US |
dc.date.accessioned | 2025-06-17T07:40:46Z | - |
dc.date.available | 2025-06-17T07:40:46Z | - |
dc.identifier.isbn | 979-8-3313-1438-5 | en_US |
dc.identifier.issn | 1049-5258 | en_US |
dc.identifier.uri | http://hdl.handle.net/10397/113671 | - |
dc.description | 38th Conference on Neural Information Processing Systems (NeurIPS 2024), 10-15 December 2024, Vancouver, Canada | en_US |
dc.language.iso | en | en_US |
dc.publisher | Neural Information Processing Systems Foundation, Inc. (NeurIPS) | en_US |
dc.rights | Posted with permission of the author. | en_US |
dc.rights | The following publication Chou, Y., Yao, M., Wang, K., Pan, Y., Zhu, R. J., Wu, J., ... & Li, G. (2024). MetaLA: Unified optimal linear approximation to softmax attention map. Advances in Neural Information Processing Systems, 37 is available at https://papers.nips.cc/paper_files/paper/2024/hash/8329a45669017898bb0cc09d27f8d2bb-Abstract-Conference.html. | en_US |
dc.title | MetaLA : unified optimal linear approximation to softmax attention map | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.volume | 37 | en_US |
dcterms.abstract | Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models. | en_US |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Advances in neural information processing systems, 2024, v. 37, https://nips.cc/virtual/2024/poster/94714 | en_US |
dcterms.isPartOf | Advances in neural information processing systems | en_US |
dcterms.issued | 2024 | - |
dc.relation.conference | Conference on Neural Information Processing Systems [NeurIPS] | en_US |
dc.description.validate | 202506 bcch | en_US |
dc.description.oa | Accepted Manuscript | en_US |
dc.identifier.FolderNumber | a3717c | - |
dc.identifier.SubFormID | 50834 | - |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | CAS Project for Young Scientists in Basic Research; National Distinguished Young Scholars; National Natural Science Foundation of China; Beijing Science and Technology Plan; Beijing Natural Science Foundation for Distinguished Young Scholars; China Postdoctoral Science Foundation; CAAIMindSpore Open Fund | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | Copyright retained by author | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
575_MetaLA_Unified_Optimal_Lin.pdf | Pre-Published version | 1.13 MB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.