Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/113412
Title: | ConFusionformer : locality-enhanced conformer through multi-resolution attention fusion for speaker verification | Authors: | Tu, Y Mak, MW Lee, KA Lin, W |
Issue Date: | Sep-2025 | Source: | Neurocomputing, 1 Sept 2025, v. 644, 130429 | Abstract: | Conformers are capable of capturing both global and local dependencies in a sequence. Notably, the modeling of local information is critical to learning speaker characteristics. However, applying Conformers to speaker verification (SV) has not witnessed much success due to their inferior locality modeling capability and low computational efficiency. In this paper, we propose an improved Conformer, ConFusionformer, to address these two challenges. For increasing model efficiency, the conventional Conformer block is modified by placing one feed-forward network between a self-attention module and a convolution module. The modified Conformer block has fewer model parameters, thus reducing the computation cost. The modification also enables a deeper network, boosting the SV performance. Moreover, multi-resolution attention fusion is introduced into the self-attention mechanism to improve locality modeling. Specifically, the restored map from a low-resolution attention score map produced by downsampled queries and keys is fused with the original attention score map to exploit the local information within the restored local region. The proposed ConFusionformer is shown to outperform the Conformer for SV on VoxCeleb, CNCeleb, SRE21, and SRE24, demonstrating the superiority of the ConFusionformer in speaker modeling. | Keywords: | Conformer Multi-resolution attention fusion Speaker embedding Speaker verification Transformer |
Publisher: | Elsevier BV | Journal: | Neurocomputing | ISSN: | 0925-2312 | EISSN: | 1872-8286 | DOI: | 10.1016/j.neucom.2025.130429 |
Appears in Collections: | Journal/Magazine Article |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.