Deep neural network driven mixture of PLDA for robust i-vector speaker verification

Li, N; Mak, MW; Chien, JT

doi:10.1109/SLT.2016.7846263

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107247

Title:	Deep neural network driven mixture of PLDA for robust i-vector speaker verification
Authors:	Li, N Mak, MW Chien, JT
Issue Date:	2016
Source:	In Proceedings of 2016 IEEE Spoken Language Technology Workshop (SLT), 13-16 December 2016, San Diego, CA, USA
Abstract:	In speaker recognition, the mismatch between the enrollment and test utterances due to noise with different signal-to-noise ratios (SNRs) is a great challenge. Based on the observation that noise-level variability causes the i-vectors to form heterogeneous clusters, this paper proposes using an SNR-aware deep neural network (DNN) to guide the training of PLDA mixture models. Specifically, given an i-vector, the SNR posterior probabilities produced by the DNN are used as the posteriors of indicator variables of the mixture model. As a result, the proposed model provides a more reasonable soft division of the i-vector space compared to the conventional mixture of PLDA. During verification, given a test trial, the marginal likelihoods from individual PLDA models are linearly combined by the posterior probabilities of SNR levels computed by the DNN. Experimental results for SNR mismatch tasks based on NIST 2012 SRE suggest that the proposed model is more effective than PLDA and conventional mixture of PLDA for handling heterogeneous corpora.
Keywords:	Deep neural networks I-vector Mixture of PLDA SNR mismatch Speaker verification
Publisher:	Institute of Electrical and Electronics Engineers
ISBN:	978-1-5090-4903-5 (Electronic) 978-1-5090-4904-2 (Print on Demand(PoD))
DOI:	10.1109/SLT.2016.7846263
Description:	2016 IEEE Spoken Language Technology Workshop (SLT), 13-16 December 2016, San Diego, CA, USA
Rights:	©2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication N. Li, M. -W. Mak and J. -T. Chien, "Deep neural network driven mixture of PLDA for robust i-vector speaker verification," 2016 IEEE Spoken Language Technology Workshop (SLT), San Diego, CA, USA, 2016, pp. 186-191 is available at https://doi.org/10.1109/SLT.2016.7846263.
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Mak_Deep_Neural_Network.pdf	Pre-Published version	375.59 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

167

Last Week
5

Last month

Citations as of Apr 12, 2026

Downloads

106

Citations as of Apr 12, 2026

SCOPUS^TM
Citations

10

Citations as of May 8, 2026

Google Scholar^TM

Check