Asynchronous voice anonymization using adversarial perturbation on speaker embedding

Wang, R; Chen, L; Lee, KA; Ling, ZH

doi:10.21437/Interspeech.2024-1888

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114606

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.creator	Wang, R	en_US
dc.creator	Chen, L	en_US
dc.creator	Lee, KA	en_US
dc.creator	Ling, ZH	en_US
dc.date.accessioned	2025-08-18T03:02:09Z	-
dc.date.available	2025-08-18T03:02:09Z	-
dc.identifier.uri	http://hdl.handle.net/10397/114606	-
dc.description	Interspeech 2024, 1-5 September 2024, Kos, Greece	en_US
dc.language.iso	en	en_US
dc.publisher	International Speech Communication Association	en_US
dc.rights	The following publication Wang, R., Chen, L., Lee, K.A., Ling, Z.-H. (2024) Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding. Proc. Interspeech 2024, 4443-4447 is available at https://doi.org/10.21437/Interspeech.2024-1888.	en_US
dc.subject	Adversarial perturbation on speaker embedding	en_US
dc.subject	Asynchronous anonymization	en_US
dc.subject	Human perception preservation	en_US
dc.subject	Voice privacy	en_US
dc.title	Asynchronous voice anonymization using adversarial perturbation on speaker embedding	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	4443	en_US
dc.identifier.epage	4447	en_US
dc.identifier.doi	10.21437/Interspeech.2024-1888	en_US
dcterms.abstract	Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances. Audio samples can be found in .	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 4443-4447	en_US
dcterms.issued	2024	-
dc.identifier.scopus	2-s2.0-85214832957	-
dc.description.validate	202508 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Others	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	The National Natural Science Foundation of China under Grant U23B2053	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	VoR allowed	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
wang24ha_interspeech.pdf		477.95 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM