Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107142
PIRA download icon_1.1View/Download Full Text
Title: Multi-level deep neural network adaptation for speaker verification using MMD and consistency regularization
Authors: Lin, W 
Mak, MM 
Li, N
Su, D
Yu, D
Issue Date: 2020
Source: In Proceedings of ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain, p. 6839-6843
Abstract: Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.
Keywords: Data augmentation
Domain adaptation
Maximum mean discrepancy
Speaker verification
Transfer learning
Publisher: Institute of Electrical and Electronics Engineers
ISBN: 978-1-5090-6631-5 (Electronic)
978-1-5090-6632-2 (Print on Demand(PoD))
DOI: 10.1109/ICASSP40776.2020.9054134
Description: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain
Rights: ©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
The following publication W. Lin, M. -M. Mak, N. Li, D. Su and D. Yu, "Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 6839-6843 is available at https://doi.org/10.1109/ICASSP40776.2020.9054134.
Appears in Collections:Conference Paper

Files in This Item:
File Description SizeFormat 
Lin_Multi-Level_Deep_Neural.pdfPre-Published version272.15 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

4
Citations as of Jun 30, 2024

Downloads

2
Citations as of Jun 30, 2024

SCOPUSTM   
Citations

27
Citations as of Jun 21, 2024

WEB OF SCIENCETM
Citations

22
Citations as of Jun 27, 2024

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.