Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/111708
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electrical and Electronic Engineering | en_US |
dc.creator | Gao, Z | en_US |
dc.creator | Mak, MW | en_US |
dc.creator | Lin, W | en_US |
dc.date.accessioned | 2025-03-13T02:22:09Z | - |
dc.date.available | 2025-03-13T02:22:09Z | - |
dc.identifier.uri | http://hdl.handle.net/10397/111708 | - |
dc.description | 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, Korea, September 18-22, 2022 | en_US |
dc.language.iso | en | en_US |
dc.publisher | International Speech Communication Association | en_US |
dc.rights | Copyright © 2022 ISCA | en_US |
dc.rights | The following publication Gao, Z., Mak, M., Lin, W. (2022) UNet-DenseNet for Robust Far-Field Speaker Verification. Proc. Interspeech 2022, 3714-3718 is available at https://doi.org/10.21437/Interspeech.2022-10350. | en_US |
dc.title | UNet-DenseNet for robust far-field speaker verification | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 3714 | en_US |
dc.identifier.epage | 3718 | en_US |
dc.identifier.doi | 10.21437/Interspeech.2022-10350 | en_US |
dcterms.abstract | Far-field speaker verification (SV) has always been critical but challenging. Data augmentation is commonly used to overcome the problems arising from far-field microphones, such as high background noise levels and reverberation effects. On top of data augmentation, this paper tackles these problems by introducing a UNet-based speech enhancement (SE) module as a front-end processor for the speaker embedding module. To prevent the SE module from distorting speaker information, we propose two improvements to the speech enhancement–speaker embedding pipeline. (1) A UNet-DenseNet joint training scheme in which the UNet is optimized by both the MSE and speaker classification losses. (2) A semi-joint training scheme that stops the UNet training but continues the DenseNet training when overfitting of the UNet is detected. Extensive experiments on noise-contaminated Voxceleb1 and the VOiCES Challenge 2019 demonstrate the effectiveness of the two training schemes. | en_US |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, p. 3714-3718 | en_US |
dcterms.issued | 2022 | - |
dc.identifier.scopus | 2-s2.0-85140084318 | - |
dc.relation.conference | Conference of the International Speech Communication Association [INTERSPEECH] | en_US |
dc.description.validate | 202503 bcch | en_US |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | OA_Others | - |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | National Natural Science Foundation of China (NSFC) | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | VoR allowed | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
gao22c_interspeech.pdf | 815.29 kB | Adobe PDF | View/Open |
Page views
5
Citations as of Apr 14, 2025
Downloads
3
Citations as of Apr 14, 2025

Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.