Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114602
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorJin, Z-
dc.creatorTu, Y-
dc.creatorMak, MW-
dc.date.accessioned2025-08-18T03:02:07Z-
dc.date.available2025-08-18T03:02:07Z-
dc.identifier.urihttp://hdl.handle.net/10397/114602-
dc.descriptionInterspeech 2024, 1-5 September 2024, Kos, Greeceen_US
dc.language.isoenen_US
dc.publisherInternational Speech Communication Associationen_US
dc.rightsThe following publication Jin, Z., Tu, Y., Mak, M.-W. (2024) W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification. Proc. Interspeech 2024, 3779-3783 is available at https://doi.org/10.21437/Interspeech.2024-354.en_US
dc.subjectDINOen_US
dc.subjectKnowledge transferen_US
dc.subjectSelf-supervised learningen_US
dc.subjectSpeaker verificationen_US
dc.titleW-GVKT : within-global-view knowledge transfer for speaker verificationen_US
dc.typeConference Paperen_US
dc.identifier.spage3779-
dc.identifier.epage3783-
dc.identifier.doi10.21437/Interspeech.2024-354-
dcterms.abstractContrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student’s output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 3779-3786-
dcterms.issued2024-
dc.identifier.scopus2-s2.0-85214800257-
dc.description.validate202508 bcch-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Othersen_US
dc.description.fundingSourceRGCen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryVoR alloweden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
jin24b_interspeech.pdf612.85 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.