Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113671
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Data Science and Artificial Intelligenceen_US
dc.creatorChou, Yen_US
dc.creatorYao, Men_US
dc.creatorWang, Ken_US
dc.creatorPan, Yen_US
dc.creatorZhu, RJen_US
dc.creatorWu, Jen_US
dc.creatorZhong, Yen_US
dc.creatorQiao, Yen_US
dc.creatorXu, Ben_US
dc.creatorLi, Gen_US
dc.date.accessioned2025-06-17T07:40:46Z-
dc.date.available2025-06-17T07:40:46Z-
dc.identifier.isbn979-8-3313-1438-5en_US
dc.identifier.issn1049-5258en_US
dc.identifier.urihttp://hdl.handle.net/10397/113671-
dc.description38th Conference on Neural Information Processing Systems (NeurIPS 2024), 10-15 December 2024, Vancouver, Canadaen_US
dc.language.isoenen_US
dc.publisherNeural Information Processing Systems Foundation, Inc. (NeurIPS)en_US
dc.rightsPosted with permission of the author.en_US
dc.rightsThe following publication Chou, Y., Yao, M., Wang, K., Pan, Y., Zhu, R. J., Wu, J., ... & Li, G. (2024). MetaLA: Unified optimal linear approximation to softmax attention map. Advances in Neural Information Processing Systems, 37 is available at https://papers.nips.cc/paper_files/paper/2024/hash/8329a45669017898bb0cc09d27f8d2bb-Abstract-Conference.html.en_US
dc.titleMetaLA : unified optimal linear approximation to softmax attention mapen_US
dc.typeConference Paperen_US
dc.identifier.volume37en_US
dcterms.abstractVarious linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: (1) Dynamic memory ability; (2) Static approximation ability; (3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationAdvances in neural information processing systems, 2024, v. 37, https://nips.cc/virtual/2024/poster/94714en_US
dcterms.isPartOfAdvances in neural information processing systemsen_US
dcterms.issued2024-
dc.relation.conferenceConference on Neural Information Processing Systems [NeurIPS]en_US
dc.description.validate202506 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera3717c-
dc.identifier.SubFormID50834-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextCAS Project for Young Scientists in Basic Research; National Distinguished Young Scholars; National Natural Science Foundation of China; Beijing Science and Technology Plan; Beijing Natural Science Foundation for Distinguished Young Scholars; China Postdoctoral Science Foundation; CAAIMindSpore Open Funden_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCopyright retained by authoren_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
575_MetaLA_Unified_Optimal_Lin.pdfPre-Published version1.13 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.