Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107121
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineering-
dc.creatorLin, W-
dc.creatorMak, MW-
dc.creatorLi, N-
dc.creatorSu, D-
dc.creatorYu, D-
dc.date.accessioned2024-06-13T01:04:02Z-
dc.date.available2024-06-13T01:04:02Z-
dc.identifier.issn2329-9290-
dc.identifier.urihttp://hdl.handle.net/10397/107121-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication W. Lin, M. -W. Mak, N. Li, D. Su and D. Yu, "A Framework for Adapting DNN Speaker Embedding Across Languages," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2810-2822, 2020 is available at https://doi.org/10.1109/TASLP.2020.3030499.en_US
dc.subjectData augmentationen_US
dc.subjectDomain adaptationen_US
dc.subjectMaximum mean discrepancyen_US
dc.subjectSpeaker verification (SV)en_US
dc.subjectTransfer learningen_US
dc.titleA framework for adapting DNN speaker embedding across languagesen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage2810-
dc.identifier.epage2822-
dc.identifier.volume28-
dc.identifier.doi10.1109/TASLP.2020.3030499-
dcterms.abstractLanguage mismatch remains a major hindrance to the extensive deployment of speaker verification (SV) systems. Current language adaptation methods in SV mainly rely on linear projection in embedding space; i.e., adaptation is carried out after the speaker embeddings have been created, which underutilizes the powerful representation of deep neural networks. This article proposes a maximum mean discrepancy (MMD) based framework for adapting deep neural network (DNN) speaker embedding across languages, featuring multi-level domain loss, separate batch normalization, and consistency regularization. We refer to the framework as MSC. We show that (1) minimizing domain discrepancy at both frame- and utterance-levels performs significantly better than at utterance-level alone; (2) separating the source-domain data from the target-domain in batch normalization improves adaptation performance; and (3) data augmentation can be utilized in the unlabelled target-domain through consistency regularization. By combining these findings, we achieve an EER of 8.69% and 7.95% in NIST SRE 2016 and 2018, respectively, which are significantly better than the previously proposed DNN adaptation methods. Our framework also works well with backend adaptation. By combining the proposed framework with backend adaptation, we achieve an 11.8% improvement over the backend adaptation in SRE18. When applying our framework to a 121-layer Densenet, we achieved an EER of 7.81% and 7.02% in NIST SRE 2016 and 2018, respectively.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE/ACM transactions on audio, speech, and language processing, 2020, v. 28, p. 2810-2822-
dcterms.isPartOfIEEE/ACM transactions on audio, speech, and language processing-
dcterms.issued2020-
dc.identifier.scopus2-s2.0-85095709215-
dc.identifier.eissn2329-9304-
dc.description.validate202403 bckw-
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumberEIE-0144en_US
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextTencent AI Lab Rhino-Bird Gift Funden_US
dc.description.pubStatusPublisheden_US
dc.identifier.OPUS43305655en_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Lin_Framework_Adapting_Dnn.pdfPre-Published version796.25 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

1
Citations as of Jun 30, 2024

Downloads

3
Citations as of Jun 30, 2024

SCOPUSTM   
Citations

14
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

11
Citations as of Jun 27, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.