Denoised senone I-Vectors for robust speaker verification

Tan, Z; Mak, MW; Mak, BKW; Zhu, Y

doi:10.1109/TASLP.2018.2796843

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/77467

Title:	Denoised senone I-Vectors for robust speaker verification
Authors:	Tan, Z Mak, MW Mak, BKW Zhu, Y
Issue Date:	Apr-2018
Source:	IEEE/ACM transactions on audio, speech, and language processing, Apr. 2018, v. 26, no. 4, 8269399, p. 820-830
Abstract:	Recently, it has been shown that senone i-vectors, whose posteriors are produced by senone deep neural networks (DNNs), outperform the conventional Gaussian mixture model (GMM) i-vectors in both speaker and language recognition tasks. The success of senone i-vectors relies on the capability of the DNN to incorporate phonetic information into the i-vector extraction process. In this paper, we argue that to apply senone i-vectors in noisy environments, it is important to robustify the phonetically discriminative acoustic features and senone posteriors estimated by the DNN. To this end, we propose a deep architecture formed by stacking a deep belief network on top of a denoising autoencoder (DAE). After backpropagation fine-tuning, the network, referred to as denoising autoencoder-deep neural network (DAE-DNN), facilitates the extraction of robust phonetically-discriminitive bottleneck (BN) features and senone posteriors for i-vector extraction. We refer to the resulting i-vectors as denoised BN-based senone i-vectors. Results on NIST 2012 SRE show that senone i-vectors outperform the conventional GMM i-vectors. More interestingly, the BN features are not only phonetically discriminative, results suggest that they also contain sufficient speaker information to produce BN-based senone i-vectors that outperform the conventional senone i-vectors. This work also shows that DAE training is more beneficial to BN feature extraction than senone posterior estimation.
Keywords:	Deep learning Denoising autoencoders I-vectors Noise robustness Phonetically discriminative features Senone posteriors Speaker verification
Publisher:	Institute of Electrical and Electronics Engineers
Journal:	IEEE/ACM transactions on audio, speech, and language processing
ISSN:	2329-9290
EISSN:	2329-9304
DOI:	10.1109/TASLP.2018.2796843
Rights:	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication Z. Tan, M. Mak, B. K. Mak and Y. Zhu, "Denoised Senone I-Vectors for Robust Speaker Verification," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 4, pp. 820-830, April 2018 is available at https://doi.org/10.1109/TASLP.2018.2796843.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Tan_Denoised_Senone_I-Vectors.pdf	Pre-Published version	1.36 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

119

Last Week
0

Last month

Citations as of Apr 14, 2025

Downloads

68

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

10

Last Week
0

Last month

Citations as of Jun 26, 2025

WEB OF SCIENCE^TM
Citations

7

Last Week
0

Last month

Citations as of Jun 5, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM