Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/70611
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electronic and Information Engineeringen_US
dc.creatorLi, Nen_US
dc.creatorMak, MWen_US
dc.creatorChien, JTen_US
dc.date.accessioned2017-12-28T06:17:31Z-
dc.date.available2017-12-28T06:17:31Z-
dc.identifier.issn2329-9290en_US
dc.identifier.urihttp://hdl.handle.net/10397/70611-
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineersen_US
dc.rights© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.en_US
dc.rightsThe following publication N. Li, M. Mak and J. Chien, "DNN-Driven Mixture of PLDA for Robust Speaker Verification," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1371-1383, June 2017 is available at https://doi.org/10.1109/TASLP.2017.2692304en_US
dc.subjectDeep neural networksen_US
dc.subjectI-vectorsen_US
dc.subjectMixture of PLDAen_US
dc.subjectSpeaker verificationen_US
dc.titleDNN-driven mixture of PLDA for robust speaker verificationen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage1371en_US
dc.identifier.epage1383en_US
dc.identifier.volume25en_US
dc.identifier.issue6en_US
dc.identifier.doi10.1109/TASLP.2017.2692304en_US
dcterms.abstractThe mismatch between enrollment and test utterances due to different types of variabilities is a great challenge in speaker verification. Based on the observation that the SNR-level variability or channel-type variability causes heterogeneous clusters in i-vector space, this paper proposes to apply supervised learning to drive or guide the learning of probabilistic linear discriminant analysis (PLDA) mixture models. Specifically, a deep neural network (DNN) is trained to produce the posterior probabilities of different SNR levels or channel types given i-vectors as input. These posteriors then replace the posterior probabilities of indicator variables in the mixture of PLDA. The discriminative training causes the mixture model to perform more reasonable soft divisions of the i-vector space as compared to the conventional mixture of PLDA. During verification, given a test i-vector and a target-speaker's i-vector, the marginal likelihood for the same-speaker hypothesis is obtained by summing the component likelihoods weighted by the component posteriors produced by the DNN, and likewise for the different-speaker hypothesis. Results based on NIST 2012 SRE demonstrate that the proposed scheme leads to better performance under more realistic situations where both training and test utterances cover a wide range of SNRs and different channel types. Unlike the previous SNR-dependent mixture of PLDA which only focuses on SNR mismatch, the proposed model is more general and is potentially applicable to addressing different types of variability in speech.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationIEEE/ACM transactions on audio, speech, and language processing, June 2017, v. 25, no. 6, special issue, p. 1371-1383en_US
dcterms.isPartOfIEEE/ACM transactions on audio, speech, and language processingen_US
dcterms.issued2017-06-
dc.identifier.isiWOS:000403300400019-
dc.identifier.ros2016005962-
dc.identifier.rosgroupid2016005709-
dc.description.ros2016-2017 > Academic research: refereed > Publication in refereed journalen_US
dc.description.validatebcrcen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumberEIE-0703-
dc.description.fundingSourceRGCen_US
dc.description.pubStatusPublisheden_US
dc.identifier.OPUS6775538-
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Mak_Dnn-Driven_Mixture_Plda.pdfPre-Published version2.34 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

52
Last Week
0
Last month
Citations as of Mar 24, 2024

Downloads

37
Citations as of Mar 24, 2024

SCOPUSTM   
Citations

17
Citations as of Mar 22, 2024

WEB OF SCIENCETM
Citations

16
Last Week
0
Last month
Citations as of Mar 28, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.