Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/111708
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorGao, Zen_US
dc.creatorMak, MWen_US
dc.creatorLin, Wen_US
dc.date.accessioned2025-03-13T02:22:09Z-
dc.date.available2025-03-13T02:22:09Z-
dc.identifier.urihttp://hdl.handle.net/10397/111708-
dc.description23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, Korea, September 18-22, 2022en_US
dc.language.isoenen_US
dc.publisherInternational Speech Communication Associationen_US
dc.rightsCopyright © 2022 ISCAen_US
dc.rightsThe following publication Gao, Z., Mak, M., Lin, W. (2022) UNet-DenseNet for Robust Far-Field Speaker Verification. Proc. Interspeech 2022, 3714-3718 is available at https://doi.org/10.21437/Interspeech.2022-10350.en_US
dc.titleUNet-DenseNet for robust far-field speaker verificationen_US
dc.typeConference Paperen_US
dc.identifier.spage3714en_US
dc.identifier.epage3718en_US
dc.identifier.doi10.21437/Interspeech.2022-10350en_US
dcterms.abstractFar-field speaker verification (SV) has always been critical but challenging. Data augmentation is commonly used to overcome the problems arising from far-field microphones, such as high background noise levels and reverberation effects. On top of data augmentation, this paper tackles these problems by introducing a UNet-based speech enhancement (SE) module as a front-end processor for the speaker embedding module. To prevent the SE module from distorting speaker information, we propose two improvements to the speech enhancement–speaker embedding pipeline. (1) A UNet-DenseNet joint training scheme in which the UNet is optimized by both the MSE and speaker classification losses. (2) A semi-joint training scheme that stops the UNet training but continues the DenseNet training when overfitting of the UNet is detected. Extensive experiments on noise-contaminated Voxceleb1 and the VOiCES Challenge 2019 demonstrate the effectiveness of the two training schemes.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, p. 3714-3718en_US
dcterms.issued2022-
dc.identifier.scopus2-s2.0-85140084318-
dc.relation.conferenceConference of the International Speech Communication Association [INTERSPEECH]en_US
dc.description.validate202503 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Others-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextNational Natural Science Foundation of China (NSFC)en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryVoR alloweden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
gao22c_interspeech.pdf815.29 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

5
Citations as of Apr 14, 2025

Downloads

3
Citations as of Apr 14, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.