Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/111708
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorGao, Zen_US
dc.creatorMak, MWen_US
dc.creatorLin, Wen_US
dc.date.accessioned2025-03-13T02:22:09Z-
dc.date.available2025-03-13T02:22:09Z-
dc.identifier.urihttp://hdl.handle.net/10397/111708-
dc.description23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, Korea, September 18-22, 2022en_US
dc.language.isoenen_US
dc.publisherInternational Speech Communication Associationen_US
dc.rightsCopyright © 2022 ISCAen_US
dc.rightsThe following publication Gao, Z., Mak, M., Lin, W. (2022) UNet-DenseNet for Robust Far-Field Speaker Verification. Proc. Interspeech 2022, 3714-3718 is available at https://doi.org/10.21437/Interspeech.2022-10350.en_US
dc.titleUNet-DenseNet for robust far-field speaker verificationen_US
dc.typeConference Paperen_US
dc.identifier.spage3714en_US
dc.identifier.epage3718en_US
dc.identifier.doi10.21437/Interspeech.2022-10350en_US
dcterms.abstractFar-field speaker verification (SV) has always been critical but challenging. Data augmentation is commonly used to overcome the problems arising from far-field microphones, such as high background noise levels and reverberation effects. On top of data augmentation, this paper tackles these problems by introducing a UNet-based speech enhancement (SE) module as a front-end processor for the speaker embedding module. To prevent the SE module from distorting speaker information, we propose two improvements to the speech enhancement–speaker embedding pipeline. (1) A UNet-DenseNet joint training scheme in which the UNet is optimized by both the MSE and speaker classification losses. (2) A semi-joint training scheme that stops the UNet training but continues the DenseNet training when overfitting of the UNet is detected. Extensive experiments on noise-contaminated Voxceleb1 and the VOiCES Challenge 2019 demonstrate the effectiveness of the two training schemes.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, p. 3714-3718en_US
dcterms.issued2022-
dc.identifier.scopus2-s2.0-85140084318-
dc.relation.conferenceConference of the International Speech Communication Association [INTERSPEECH]en_US
dc.description.validate202503 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Others-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextNational Natural Science Foundation of China (NSFC)en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryVoR alloweden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
gao22c_interspeech.pdf815.29 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

5
Citations as of Apr 14, 2025

Downloads

3
Citations as of Apr 14, 2025

SCOPUSTM   
Citations

11
Citations as of Sep 19, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.