Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/114602
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electrical and Electronic Engineering | - |
dc.creator | Jin, Z | - |
dc.creator | Tu, Y | - |
dc.creator | Mak, MW | - |
dc.date.accessioned | 2025-08-18T03:02:07Z | - |
dc.date.available | 2025-08-18T03:02:07Z | - |
dc.identifier.uri | http://hdl.handle.net/10397/114602 | - |
dc.description | Interspeech 2024, 1-5 September 2024, Kos, Greece | en_US |
dc.language.iso | en | en_US |
dc.publisher | International Speech Communication Association | en_US |
dc.rights | The following publication Jin, Z., Tu, Y., Mak, M.-W. (2024) W-GVKT: Within-Global-View Knowledge Transfer for Speaker Verification. Proc. Interspeech 2024, 3779-3783 is available at https://doi.org/10.21437/Interspeech.2024-354. | en_US |
dc.subject | DINO | en_US |
dc.subject | Knowledge transfer | en_US |
dc.subject | Self-supervised learning | en_US |
dc.subject | Speaker verification | en_US |
dc.title | W-GVKT : within-global-view knowledge transfer for speaker verification | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 3779 | - |
dc.identifier.epage | 3783 | - |
dc.identifier.doi | 10.21437/Interspeech.2024-354 | - |
dcterms.abstract | Contrastive self-supervised learning has played an important role in speaker verification (SV). However, such approaches suffer from false-negative issues. To address this problem, we enhance the non-contrastive DINO framework by enabling knowledge transfer from the teacher network to the student network through diversified versions of global views and call the method Within-Global-View Knowledge Transfer (W-GVKT) DINO. We discovered that given the global view of the entire utterance, creating discrepancies in the student’s output through applying spectral augmentation and feature diversification to the global view can facilitate the transfer of knowledge from the teacher to the student. With negligible computational resource increases, W-GVKT achieves an impressive EER of 4.11% without utilizing speaker labels on Voxceleb1. When combined with the RDNIO framework, W-GVKT achieved an EER of 2.89%. | - |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 3779-3786 | - |
dcterms.issued | 2024 | - |
dc.identifier.scopus | 2-s2.0-85214800257 | - |
dc.description.validate | 202508 bcch | - |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | OA_Others | en_US |
dc.description.fundingSource | RGC | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | VoR allowed | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
jin24b_interspeech.pdf | 612.85 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.