Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114106
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorLi, Zen_US
dc.creatorMak, MWen_US
dc.creatorPilanci, Men_US
dc.creatorMeng, Hen_US
dc.date.accessioned2025-07-11T09:11:55Z-
dc.date.available2025-07-11T09:11:55Z-
dc.identifier.issn1558-7916en_US
dc.identifier.urihttp://hdl.handle.net/10397/114106-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication Z. Li, M. -W. Mak, M. Pilanci and H. Meng, "Mutual Information-Enhanced Contrastive Learning With Margin for Maximal Speaker Separability," in IEEE Transactions on Audio, Speech and Language Processing, vol. 33, pp. 2961-2972, 2025 is available at https://doi.org/10.1109/TASLPRO.2025.3583485.en_US
dc.subjectAdditive angular marginen_US
dc.subjectContrastive learningen_US
dc.subjectMutual informationen_US
dc.subjectSpeaker verificationen_US
dc.titleMutual information-enhanced contrastive learning with margin for maximal speaker separabilityen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage2961en_US
dc.identifier.epage2972en_US
dc.identifier.volume33en_US
dc.identifier.doi10.1109/TASLPRO.2025.3583485en_US
dcterms.abstractContrastive learning across various augmentations of the same utterance can enhance speaker representations' ability to distinguish new speakers. This paper introduces a supervised contrastive learning objective that optimizes a speaker embedding space using label information from training data. Besides augmenting different segments of an utterance to form a positive pair, our approach generates multiple positive pairs by augmenting various utterances from the same speaker. However, employing contrastive learning for speaker verification (SV) presents two challenges: (1) softmax loss is ineffective in reducing intra-class variation, and (2) previous research has shown that contrastive learning can share information across the augmented views of an object but could discard task-relevant nonshared information, suggesting that it is essential to keep nonshared speaker information across the augmented views when constructing a speaker representation space. To overcome the first challenge, we incorporate an additive angular margin in the contrastive loss. For the second challenge, we maximize the mutual information (MI) between the squeezed low-level features and speaker representations to extract the nonshared information. Evaluations on VoxCeleb, CN-Celeb, and CU-MARVEL validate that our new learning objective enables ECAPA-TDNN to identify an embedding space that exhibits robust speaker discrimination.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE transactions on audio, speech and language processing, 2025, v. 33, p. 2961-2972en_US
dcterms.isPartOfIEEE transactions on audio, speech and language processingen_US
dcterms.issued2025-
dc.identifier.eissn1558-7924en_US
dc.description.validate202507 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera3850a-
dc.identifier.SubFormID51337-
dc.description.fundingSourceRGCen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Li_Mutual_Information_Enhanced.pdfPre-Published version2.05 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.