Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113412
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorTu, Y-
dc.creatorMak, MW-
dc.creatorLee, KA-
dc.creatorLin, W-
dc.date.accessioned2025-06-06T00:42:13Z-
dc.date.available2025-06-06T00:42:13Z-
dc.identifier.issn0925-2312-
dc.identifier.urihttp://hdl.handle.net/10397/113412-
dc.language.isoenen_US
dc.publisherElsevier BVen_US
dc.subjectConformeren_US
dc.subjectMulti-resolution attention fusionen_US
dc.subjectSpeaker embeddingen_US
dc.subjectSpeaker verificationen_US
dc.subjectTransformeren_US
dc.titleConFusionformer : locality-enhanced conformer through multi-resolution attention fusion for speaker verificationen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume644-
dc.identifier.doi10.1016/j.neucom.2025.130429-
dcterms.abstractConformers are capable of capturing both global and local dependencies in a sequence. Notably, the modeling of local information is critical to learning speaker characteristics. However, applying Conformers to speaker verification (SV) has not witnessed much success due to their inferior locality modeling capability and low computational efficiency. In this paper, we propose an improved Conformer, ConFusionformer, to address these two challenges. For increasing model efficiency, the conventional Conformer block is modified by placing one feed-forward network between a self-attention module and a convolution module. The modified Conformer block has fewer model parameters, thus reducing the computation cost. The modification also enables a deeper network, boosting the SV performance. Moreover, multi-resolution attention fusion is introduced into the self-attention mechanism to improve locality modeling. Specifically, the restored map from a low-resolution attention score map produced by downsampled queries and keys is fused with the original attention score map to exploit the local information within the restored local region. The proposed ConFusionformer is shown to outperform the Conformer for SV on VoxCeleb, CNCeleb, SRE21, and SRE24, demonstrating the superiority of the ConFusionformer in speaker modeling.-
dcterms.accessRightsembaroged accessen_US
dcterms.bibliographicCitationNeurocomputing, 1 Sept 2025, v. 644, 130429-
dcterms.isPartOfNeurocomputing-
dcterms.issued2025-09-
dc.identifier.scopus2-s2.0-105005393269-
dc.identifier.eissn1872-8286-
dc.identifier.artn130429-
dc.description.validate202506 bcch-
dc.identifier.FolderNumbera3641en_US
dc.identifier.SubFormID50551en_US
dc.description.fundingSourceRGCen_US
dc.description.pubStatusPublisheden_US
dc.date.embargo2027-09-01en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Open Access Information
Status embaroged access
Embargo End Date 2027-09-01
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.