Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/107142
Title: | Multi-level deep neural network adaptation for speaker verification using MMD and consistency regularization | Authors: | Lin, W Mak, MM Li, N Su, D Yu, D |
Issue Date: | 2020 | Source: | In Proceedings of ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain, p. 6839-6843 | Abstract: | Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18. | Keywords: | Data augmentation Domain adaptation Maximum mean discrepancy Speaker verification Transfer learning |
Publisher: | Institute of Electrical and Electronics Engineers | ISBN: | 978-1-5090-6631-5 (Electronic) 978-1-5090-6632-2 (Print on Demand(PoD)) |
DOI: | 10.1109/ICASSP40776.2020.9054134 | Description: | ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain | Rights: | ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication W. Lin, M. -M. Mak, N. Li, D. Su and D. Yu, "Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 6839-6843 is available at https://doi.org/10.1109/ICASSP40776.2020.9054134. |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Lin_Multi-Level_Deep_Neural.pdf | Pre-Published version | 272.15 kB | Adobe PDF | View/Open |
Page views
4
Citations as of Jun 30, 2024
Downloads
2
Citations as of Jun 30, 2024
SCOPUSTM
Citations
27
Citations as of Jun 21, 2024
WEB OF SCIENCETM
Citations
22
Citations as of Jun 27, 2024
Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.