Variational domain adversarial learning with mutual information maximization for speaker verification

Tu, Y; Mak, MW; Chien, JT

doi:10.1109/TASLP.2020.3004760

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107135

Title:	Variational domain adversarial learning with mutual information maximization for speaker verification
Authors:	Tu, Y Mak, MW Chien, JT
Issue Date:	2020
Source:	IEEE/ACM transactions on audio, speech, and language processing, 2020, v. 28, p. 2013-2024
Abstract:	Domain mismatch is a common problem in speaker verification (SV) and often causes performance degradation. For the system relying on the Gaussian PLDA backend to suppress the channel variability, the performance would be further limited if there is no Gaussianity constraint on the learned embeddings. This paper proposes an information-maximized variational domain adversarial neural network (InfoVDANN) that incorporates an InfoVAE into domain adversarial training (DAT) to reduce domain mismatch and simultaneously meet the Gaussianity requirement of the PLDA backend. Specifically, DAT is applied to produce speaker discriminative and domain-invariant features, while the InfoVAE performs variational regularization on the embedded features so that they follow a Gaussian distribution. Another benefit of the InfoVAE is that it avoids posterior collapse in VAEs by preserving the mutual information between the embedded features and the training set so that extra speaker information can be retained in the features. Experiments on both SRE16 and SRE18-CMN2 show that the InfoVDANN outperforms the recent VDANN, which suggests that increasing the mutual information between the embedded features and input features enables the InfoVDANN to extract extra speaker information that is otherwise not possible.
Keywords:	Domain adaptation Domain adversarial training Mutual information Speaker verification (SV) Variational autoencoder
Publisher:	Institute of Electrical and Electronics Engineers
Journal:	IEEE/ACM transactions on audio, speech, and language processing
ISSN:	2329-9290
EISSN:	2329-9304
DOI:	10.1109/TASLP.2020.3004760
Rights:	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication Y. Tu, M. -W. Mak and J. -T. Chien, "Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 2013-2024, 2020 is available at https://doi.org/10.1109/TASLP.2020.3004760.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Lin_Variational_Domain_Adversarial.pdf	Pre-Published version	1.73 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

5

Citations as of Jun 30, 2024

Downloads

5

Citations as of Jun 30, 2024

SCOPUS^TM
Citations

31

Citations as of Jun 21, 2024

WEB OF SCIENCE^TM
Citations

25

Citations as of Jun 27, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM