Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/77467
Title: Denoised senone I-Vectors for robust speaker verification
Authors: Tan, Z 
Mak, MW 
Mak, BKW
Zhu, Y
Keywords: Deep learning
Denoising autoencoders
I-vectors
Noise robustness
Phonetically discriminative features
Senone posteriors
Speaker verification
Issue Date: 2018
Publisher: Institute of Electrical and Electronics Engineers
Source: IEEE/ACM transactions on audio, speech, and language processing, 2018, v. 26, no. 4, 8269399, p. 820-830 How to cite?
Journal: IEEE/ACM transactions on audio, speech, and language processing 
Abstract: Recently, it has been shown that senone i-vectors, whose posteriors are produced by senone deep neural networks (DNNs), outperform the conventional Gaussian mixture model (GMM) i-vectors in both speaker and language recognition tasks. The success of senone i-vectors relies on the capability of the DNN to incorporate phonetic information into the i-vector extraction process. In this paper, we argue that to apply senone i-vectors in noisy environments, it is important to robustify the phonetically discriminative acoustic features and senone posteriors estimated by the DNN. To this end, we propose a deep architecture formed by stacking a deep belief network on top of a denoising autoencoder (DAE). After backpropagation fine-tuning, the network, referred to as denoising autoencoder-deep neural network (DAE-DNN), facilitates the extraction of robust phonetically-discriminitive bottleneck (BN) features and senone posteriors for i-vector extraction. We refer to the resulting i-vectors as denoised BN-based senone i-vectors. Results on NIST 2012 SRE show that senone i-vectors outperform the conventional GMM i-vectors. More interestingly, the BN features are not only phonetically discriminative, results suggest that they also contain sufficient speaker information to produce BN-based senone i-vectors that outperform the conventional senone i-vectors. This work also shows that DAE training is more beneficial to BN feature extraction than senone posterior estimation.
URI: http://hdl.handle.net/10397/77467
ISSN: 2329-9290
EISSN: 2329-9304
DOI: 10.1109/TASLP.2018.2796843
Appears in Collections:Journal/Magazine Article

Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page view(s)

1
Citations as of Sep 18, 2018

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.