IML-Spikeformer : input-aware multilevel spiking transformer for speech processing

Song, Z; Zhang, S; Chou, Y; Wu, J; Li, H

doi:10.1109/TNNLS.2025.3615971

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117186

DC Field	Value	Language
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.contributor	Department of Computing	en_US
dc.creator	Song, Z	en_US
dc.creator	Zhang, S	en_US
dc.creator	Chou, Y	en_US
dc.creator	Wu, J	en_US
dc.creator	Li, H	en_US
dc.date.accessioned	2026-02-06T02:07:13Z	-
dc.date.available	2026-02-06T02:07:13Z	-
dc.identifier.issn	2162-237X	en_US
dc.identifier.uri	http://hdl.handle.net/10397/117186	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Z. Song, S. Zhang, Y. Chou, J. Wu and H. Li, "IML-Spikeformer: Input-Aware Multilevel Spiking Transformer for Speech Processing," in IEEE Transactions on Neural Networks and Learning Systems, vol. 37, no. 3, pp. 1377-1389, March 2026 is available at https://doi.org/10.1109/TNNLS.2025.3615971.	en_US
dc.subject	Neuromorphic auditory processing	en_US
dc.subject	Speech recognition	en_US
dc.subject	Spiking neural networks (SNNs)	en_US
dc.subject	Spiking trans- former	en_US
dc.title	IML-Spikeformer : input-aware multilevel spiking transformer for speech processing	en_US
dc.type	Journal/Magazine Article	en_US
dc.description.otherinformation	Title on author's file: IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing	en_US
dc.identifier.spage	1377	en_US
dc.identifier.epage	1389	en_US
dc.identifier.volume	37	en_US
dc.identifier.issue	3	en_US
dc.identifier.doi	10.1109/TNNLS.2025.3615971	en_US
dcterms.abstract	Spiking neural networks (SNNs), inspired by biological neural mechanisms, represent a promising neuromorphic computing paradigm that offers energy-efficient alternatives to traditional artificial neural networks (ANNs). Despite proven effectiveness, SNN architectures have struggled to achieve competitive performance on large-scale speech processing tasks. Two key challenges hinder progress: 1) the high computational overhead during training caused by multitimestep spike firing and 2) the absence of large-scale SNN architectures tailored to speech processing tasks. To overcome the issues, we introduce the input-aware multilevel spikeformer (IML-Spikeformer), a spiking transformer architecture specifically designed for large-scale speech processing. Central to our design is the input-aware multilevel spike (IMLS) mechanism, which simulates multitimestep spike firing within a single timestep using an adaptive, input-aware thresholding scheme. IML-Spikeformer further integrates a reparameterized spiking self-attention (RepSSA) module with a hierarchical decay mask (HDM), forming the HD-RepSSA module. This module enhances the precision of attention maps and enables modeling of multiscale temporal dependencies in speech signals. Experiments demonstrate that IML-Spikeformer achieves word error rates (WERs) of 6.0% on AiShell-1 and 3.4% on Librispeech-960, comparable to conventional ANN transformers while reducing theoretical inference energy consumption by 4.64X and 4.32X, , respectively. IML-Spikeformer marks an advance of scalable SNN architectures for large-scale speech processing in both task performance and energy efficiency. Our source code and model checkpoints are publicly available at github.com/Pooookeman/IML-Spikeformer	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE transactions on neural networks and learning systems, Mar. 2026, v. 37, no. 3, p. 1377-1389	en_US
dcterms.isPartOf	IEEE transactions on neural networks and learning systems	en_US
dcterms.issued	2026-03	-
dc.identifier.eissn	2162-2388	en_US
dc.description.validate	202602 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a4305	-
dc.identifier.SubFormID	52560	-
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported in part by the National Natural Science Foundation of China under Grant 62271432 and Grant 62306259; in part by Shenzhen Science and Technology Program (Shenzhen Key Laboratory) under Grant ZDSYS20230626091302006; in part by the Program for Guangdong Introducing Innovative and Entrepreneurial Teams, under Grant 2023ZT10X044; and in part by the Research Grants Council of the 5 Hong Kong, SAR under Grant C5052-23G, Grant PolyU15217424, and Grant PolyU25216423.	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Song_IML_Spikeformer_Input.pdf	Pre-Published version	882.91 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM