Multi-level deep neural network adaptation for speaker verification using MMD and consistency regularization

Lin, W; Mak, MM; Li, N; Su, D; Yu, D

doi:10.1109/ICASSP40776.2020.9054134

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/107142

Title:	Multi-level deep neural network adaptation for speaker verification using MMD and consistency regularization
Authors:	Lin, W Mak, MM Li, N Su, D Yu, D
Issue Date:	2020
Source:	In Proceedings of ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain, p. 6839-6843
Abstract:	Adapting speaker verification (SV) systems to a new environment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy between training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What's more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.
Keywords:	Data augmentation Domain adaptation Maximum mean discrepancy Speaker verification Transfer learning
Publisher:	Institute of Electrical and Electronics Engineers
ISBN:	978-1-5090-6631-5 (Electronic) 978-1-5090-6632-2 (Print on Demand(PoD))
DOI:	10.1109/ICASSP40776.2020.9054134
Description:	ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 04-08 May 2020, Barcelona, Spain
Rights:	©2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The following publication W. Lin, M. -M. Mak, N. Li, D. Su and D. Yu, "Multi-Level Deep Neural Network Adaptation for Speaker Verification Using MMD and Consistency Regularization," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 6839-6843 is available at https://doi.org/10.1109/ICASSP40776.2020.9054134.
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Lin_Multi-Level_Deep_Neural.pdf	Pre-Published version	272.15 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

Page views

147

Last Week
3

Last month

Citations as of Apr 12, 2026

Downloads

133

Citations as of Apr 12, 2026

SCOPUS^TM
Citations

36

Citations as of May 8, 2026

WEB OF SCIENCE^TM
Citations

33

Citations as of Apr 23, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM