Adversarial separation and adaptation network for far-field speaker verification

Yi, L; Mak, MW

doi:10.21437/Interspeech.2020-2372

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/106892

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.creator	Yi, L	en_US
dc.creator	Mak, MW	en_US
dc.date.accessioned	2024-06-07T00:58:41Z	-
dc.date.available	2024-06-07T00:58:41Z	-
dc.identifier.isbn	978-1-7138-2069-7	en_US
dc.identifier.uri	http://hdl.handle.net/10397/106892	-
dc.description	21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25-29 October 2020, Shanghai, China	en_US
dc.language.iso	en	en_US
dc.publisher	International Speech Communication Association (ISCA)	en_US
dc.rights	Copyright © 2020 ISCA	en_US
dc.rights	The following publication Yi, L., Mak, M.-W. (2020) Adversarial Separation and Adaptation Network for Far-Field Speaker Verification. Proc. Interspeech 2020, 4298-4302 is available at https://doi.org/10.21437/Interspeech.2020-2372.	en_US
dc.title	Adversarial separation and adaptation network for far-field speaker verification	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	4298	en_US
dc.identifier.epage	4302	en_US
dc.identifier.doi	10.21437/Interspeech.2020-2372	en_US
dcterms.abstract	Typically, speaker verification systems are highly optimized on the speech collected by close-talking microphones. However, these systems will perform poorly when the users use far-field microphones during verification. In this paper, we propose an adversarial separation and adaptation network (ADSAN) to extract speaker discriminative and domain-invariant features through adversarial learning. The idea is based on the notion that speaker embedding comprises domain-specific components and domain-shared components, and that the two components can be disentangled by the interplay of the separation network and the adaptation network in the ADSAN. We also propose to incorporate a mutual information neural estimator into the domain adaptation network to retain speaker discriminative information. Experiments on the VOiCES Challenge 2019 demonstrate that the proposed approaches can produce more domain-invariant and speaker discriminative representations, which could help to reduce the domain shift caused by different types of microphones and reverberant environments.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25-29 October 2020, Shanghai, China, p. 4298-4302	en_US
dcterms.issued	2020	-
dc.identifier.scopus	2-s2.0-85098122245	-
dc.relation.conference	International Speech Communication Association [Interspeech]	en_US
dc.description.validate	202405 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	EIE-0143	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	NSFC	en_US
dc.description.pubStatus	Published	en_US
dc.identifier.OPUS	55968715	-
dc.description.oaCategory	VoR allowed	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
yi20b_interspeech.pdf		381.14 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

87

Last Week
1

Last month

Citations as of Sep 28, 2025

Downloads

44

Citations as of Sep 28, 2025

SCOPUS^TM
Citations

8

Citations as of Sep 26, 2025

WEB OF SCIENCE^TM
Citations

8

Citations as of Oct 2, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM