DNN-driven mixture of PLDA for robust speaker verification

Li, N; Mak, MW; Chien, JT

doi:10.1109/TASLP.2017.2692304

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/70611

Title:	DNN-driven mixture of PLDA for robust speaker verification
Authors:	Li, N Mak, MW Chien, JT
Issue Date:	Jun-2017
Source:	IEEE/ACM transactions on audio, speech, and language processing, June 2017, v. 25, no. 6, special issue, p. 1371-1383
Abstract:	The mismatch between enrollment and test utterances due to different types of variabilities is a great challenge in speaker verification. Based on the observation that the SNR-level variability or channel-type variability causes heterogeneous clusters in i-vector space, this paper proposes to apply supervised learning to drive or guide the learning of probabilistic linear discriminant analysis (PLDA) mixture models. Specifically, a deep neural network (DNN) is trained to produce the posterior probabilities of different SNR levels or channel types given i-vectors as input. These posteriors then replace the posterior probabilities of indicator variables in the mixture of PLDA. The discriminative training causes the mixture model to perform more reasonable soft divisions of the i-vector space as compared to the conventional mixture of PLDA. During verification, given a test i-vector and a target-speaker's i-vector, the marginal likelihood for the same-speaker hypothesis is obtained by summing the component likelihoods weighted by the component posteriors produced by the DNN, and likewise for the different-speaker hypothesis. Results based on NIST 2012 SRE demonstrate that the proposed scheme leads to better performance under more realistic situations where both training and test utterances cover a wide range of SNRs and different channel types. Unlike the previous SNR-dependent mixture of PLDA which only focuses on SNR mismatch, the proposed model is more general and is potentially applicable to addressing different types of variability in speech.
Keywords:	Deep neural networks I-vectors Mixture of PLDA Speaker verification
Publisher:	Institute of Electrical and Electronics Engineers
Journal:	IEEE/ACM transactions on audio, speech, and language processing
ISSN:	2329-9290
DOI:	10.1109/TASLP.2017.2692304
Rights:	© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication N. Li, M. Mak and J. Chien, "DNN-Driven Mixture of PLDA for Robust Speaker Verification," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 6, pp. 1371-1383, June 2017 is available at https://doi.org/10.1109/TASLP.2017.2692304
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Mak_Dnn-Driven_Mixture_Plda.pdf	Pre-Published version	2.34 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

101

Last Week
0

Last month

Citations as of Apr 14, 2025

Downloads

80

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

19

Citations as of Jul 11, 2024

WEB OF SCIENCE^TM
Citations

16

Last Week
0

Last month

Citations as of Nov 13, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM