Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114606
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Electrical and Electronic Engineeringen_US
dc.creatorWang, Ren_US
dc.creatorChen, Len_US
dc.creatorLee, KAen_US
dc.creatorLing, ZHen_US
dc.date.accessioned2025-08-18T03:02:09Z-
dc.date.available2025-08-18T03:02:09Z-
dc.identifier.urihttp://hdl.handle.net/10397/114606-
dc.descriptionInterspeech 2024, 1-5 September 2024, Kos, Greeceen_US
dc.language.isoenen_US
dc.publisherInternational Speech Communication Associationen_US
dc.rightsThe following publication Wang, R., Chen, L., Lee, K.A., Ling, Z.-H. (2024) Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding. Proc. Interspeech 2024, 4443-4447 is available at https://doi.org/10.21437/Interspeech.2024-1888.en_US
dc.subjectAdversarial perturbation on speaker embeddingen_US
dc.subjectAsynchronous anonymizationen_US
dc.subjectHuman perception preservationen_US
dc.subjectVoice privacyen_US
dc.titleAsynchronous voice anonymization using adversarial perturbation on speaker embeddingen_US
dc.typeConference Paperen_US
dc.identifier.spage4443en_US
dc.identifier.epage4447en_US
dc.identifier.doi10.21437/Interspeech.2024-1888en_US
dcterms.abstractVoice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances. Audio samples can be found in .en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 4443-4447en_US
dcterms.issued2024-
dc.identifier.scopus2-s2.0-85214832957-
dc.description.validate202508 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Others-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe National Natural Science Foundation of China under Grant U23B2053en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryVoR alloweden_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
wang24ha_interspeech.pdf477.95 kBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.