Contrastive self-supervised speaker embedding with sequential disentanglement

Tu, Y; Mak, MW; Chien, JT

doi:10.1109/TASLP.2024.3402077

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/106862

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.creator	Tu, Y	-
dc.creator	Mak, MW	-
dc.creator	Chien, JT	-
dc.date.accessioned	2024-06-06T06:06:02Z	-
dc.date.available	2024-06-06T06:06:02Z	-
dc.identifier.issn	2329-9290	-
dc.identifier.uri	http://hdl.handle.net/10397/106862	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	© 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	The following publication Y. Tu, M. -W. Mak and J. -T. Chien, "Contrastive Self-Supervised Speaker Embedding With Sequential Disentanglement," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2704-2715, 2024 is available at https://doi.org/10.1109/TASLP.2024.3402077.	en_US
dc.subject	Contrastive learning	en_US
dc.subject	Disentangled representation learning	en_US
dc.subject	Speaker embedding	en_US
dc.subject	Speaker verification	en_US
dc.subject	Variational autoencoder	en_US
dc.title	Contrastive self-supervised speaker embedding with sequential disentanglement	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	2704	-
dc.identifier.epage	2715	-
dc.identifier.volume	32	-
dc.identifier.doi	10.1109/TASLP.2024.3402077	-
dcterms.abstract	Contrastive self-supervised learning has been widely used in speaker embedding to address the labeling challenge. Contrastive speaker embedding assumes that the contrast between the positive and negative pairs of speech segments is attributed to speaker identity only. However, this assumption is incorrect because speech signals contain not only speaker identity but also linguistic content. In this paper, we propose a contrastive learning framework with sequential disentanglement to remove linguistic content by incorporating a disentangled sequential variational autoencoder (DSVAE) into the conventional contrastive learning framework. The DSVAE aims to disentangle speaker factors from content factors in an embedding space so that the speaker factors become the main contributor to the contrastive loss. Because content factors have been removed from contrastive learning, the resulting speaker embeddings will be content-invariant. The learned embeddings are also robust to language mismatch. It is shown that the proposed method consistently outperforms the conventional contrastive speaker embedding on the VoxCeleb1 and CN-Celeb datasets. This finding suggests that applying sequential disentanglement is beneficial to learning speaker-discriminative embeddings.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE/ACM transactions on audio, speech, and language processing, 2024, v. 32, p. 2704-2715	-
dcterms.isPartOf	IEEE/ACM transactions on audio, speech, and language processing	-
dcterms.issued	2024	-
dc.identifier.scopus	2-s2.0-85193518794	-
dc.identifier.eissn	2329-9304	-
dc.description.validate	202406 bcch	-
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a2778	en_US
dc.identifier.SubFormID	48311	en_US
dc.description.fundingSource	RGC	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Tu_Contrastive_Self-Supervised_Speaker.pdf	Pre-Published version	2.58 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

5

Citations as of Jun 30, 2024

Downloads

15

Citations as of Jun 30, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Altmetric

Google Scholar^TM