Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification

Li, N; Mak, MW; Lin, WW; Chien, JT

doi:10.1016/j.csl.2017.04.001

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/95586

Title:	Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification
Authors:	Li, N Mak, MW Lin, WW Chien, JT
Issue Date:	Sep-2017
Source:	Computer speech and language, Sept. 2017, v. 45, p. 83-103
Abstract:	Although i-vectors together with probabilistic LDA (PLDA) have achieved a great success in speaker verification, how to suppress the undesirable effects caused by the variability in utterance length and background noise level is still a challenge. This paper aims to improve the robustness of i-vector based speaker verification systems by compensating for the utterance-length variability and noise-level variability. Inspired by the recent findings that noise-level variability can be modeled by a signal-to-noise ratio (SNR) subspace and that duration variability can be modeled as additive noise in the i-vector space, we propose to add an SNR factor and a duration factor to the PLDA model. In this framework, we assume that i-vectors derived from utterances with comparable durations share similar duration-specific information and that i-vectors extracted from utterances within a narrow SNR range have similar SNR-specific information. Based on these assumptions, an i-vector can be represented as a linear combination of four components: speaker, SNR, duration, and channel. A variational Bayes algorithm is developed to infer this latent variable model via a discriminative subspace training procedure. In the testing stage, different variabilities are compensated for when computing the likelihood ratio. Experiments on Common Conditions 1 and 4 in NIST 2012 SRE show that the proposed model outperforms the conventional PLDA and SNR-invariant PLDA. Results also show that the proposed model performs better than the uncertainty-propagation PLDA (UP-PLDA) for long test utterances.
Keywords:	Duration variation I-vector PLDA SNR mismatch Speaker verification Variational Bayes
Publisher:	Academic Press
Journal:	Computer speech and language
ISSN:	0885-2308
DOI:	10.1016/j.csl.2017.04.001
Rights:	© 2017 Elsevier Ltd. All rights reserved. © 2017. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/. The following publication Li, N., Mak, M. W., Lin, W. W., & Chien, J. T. (2017). Discriminative subspace modeling of SNR and duration variabilities for robust speaker verification. Computer Speech & Language, 45, 83-103 is available at https://doi.org/10.1016/j.csl.2017.04.001.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Mak_Discriminative_Subspace_Modeling.pdf	Pre-Published version	2.34 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

34

Last Week
0

Last month

Citations as of Oct 13, 2024

Downloads

56

Citations as of Oct 13, 2024

SCOPUS^TM
Citations

7

Citations as of Oct 17, 2024

WEB OF SCIENCE^TM
Citations

5

Citations as of Oct 10, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM