Multisource i-vectors domain adaptation using maximum mean discrepancy based autoencoders

Lin, WW; Mak, MW; Chien, JT

doi:10.1109/TASLP.2018.2866707

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/95585

Title:	Multisource i-vectors domain adaptation using maximum mean discrepancy based autoencoders
Authors:	Lin, WW Mak, MW Chien, JT
Issue Date:	Dec-2018
Source:	IEEE/ACM transactions on audio, speech, and language processing, Dec. 2018, v. 26, no. 12, 8445620, p. 2412-2422
Abstract:	Like many machine learning tasks, the performance of speaker verification (SV) systems degrades when training and test data come from very different distributions. What's more, both training and test data themselves could be composed of heterogeneous subsets. These multisource mismatches are detrimental to SV performance. This paper proposes incorporating maximum mean discrepancy (MMD) into the loss function of autoencoders to reduce these mismatches. MMD is a nonparametric method for measuring the distance between two probability distributions. With a properly chosen kernel, MMD can match up to infinite moments of data distributions. We generalize MMD to measure the discrepancies of multiple distributions.We call the generalized MMDdomainwiseMMD. Using domainwiseMMDas an objective function, we propose two autoencoders, namely nuisance-attribute autoencoder (NAE) and domain-invariant autoencoder (DAE), for multisource i-vector adaptation. NAE encodes the features that cause most of the multisource mismatch measured by domainwise MMD. DAE directly encodes the features that minimize the multisource mismatch. Using these MMD-based autoencoders as a preprocessing step for PLDA training, we achieve a relative improvement of 19.2% EER on the NIST 2016 SRE compared to PLDA without adaptation. We also found that MMD-based autoencoders are more robust to unseen domains. In the domain robustness experiments, MMD-based autoencoders show 6.8% and 5.2% improvements over IDVC on female and male Cantonese speakers, respectively.
Keywords:	Domain adaptation I-vectors Maximum mean discrepancy Speaker verification
Publisher:	Institute of Electrical and Electronics Engineers
Journal:	IEEE/ACM transactions on audio, speech, and language processing
ISSN:	2329-9290
EISSN:	2329-9304
DOI:	10.1109/TASLP.2018.2866707
Rights:	© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication W. Lin, M. Mak and J. Chien, "Multisource I-Vectors Domain Adaptation Using Maximum Mean Discrepancy Based Autoencoders," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 12, pp. 2412-2422, Dec. 2018 is available at https://doi.org/10.1109/TASLP.2018.2866707.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Lin_Multisource_I-Vectors_Domain.pdf	Pre-Published version	1.28 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

78

Last Week
0

Last month

Citations as of Apr 14, 2025

Downloads

143

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

54

Citations as of Jan 9, 2025

WEB OF SCIENCE^TM
Citations

42

Citations as of Oct 10, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM