Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/114606
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Electrical and Electronic Engineering | en_US |
dc.creator | Wang, R | en_US |
dc.creator | Chen, L | en_US |
dc.creator | Lee, KA | en_US |
dc.creator | Ling, ZH | en_US |
dc.date.accessioned | 2025-08-18T03:02:09Z | - |
dc.date.available | 2025-08-18T03:02:09Z | - |
dc.identifier.uri | http://hdl.handle.net/10397/114606 | - |
dc.description | Interspeech 2024, 1-5 September 2024, Kos, Greece | en_US |
dc.language.iso | en | en_US |
dc.publisher | International Speech Communication Association | en_US |
dc.rights | The following publication Wang, R., Chen, L., Lee, K.A., Ling, Z.-H. (2024) Asynchronous Voice Anonymization Using Adversarial Perturbation On Speaker Embedding. Proc. Interspeech 2024, 4443-4447 is available at https://doi.org/10.21437/Interspeech.2024-1888. | en_US |
dc.subject | Adversarial perturbation on speaker embedding | en_US |
dc.subject | Asynchronous anonymization | en_US |
dc.subject | Human perception preservation | en_US |
dc.subject | Voice privacy | en_US |
dc.title | Asynchronous voice anonymization using adversarial perturbation on speaker embedding | en_US |
dc.type | Conference Paper | en_US |
dc.identifier.spage | 4443 | en_US |
dc.identifier.epage | 4447 | en_US |
dc.identifier.doi | 10.21437/Interspeech.2024-1888 | en_US |
dcterms.abstract | Voice anonymization has been developed as a technique for preserving privacy by replacing the speaker's voice in a speech signal with that of a pseudo-speaker, thereby obscuring the original voice attributes from machine recognition and human perception. In this paper, we focus on altering the voice attributes against machine recognition while retaining human perception. We referred to this as the asynchronous voice anonymization. To this end, a speech generation framework incorporating a speaker disentanglement mechanism is employed to generate the anonymized speech. The speaker attributes are altered through adversarial perturbation applied on the speaker embedding, while human perception is preserved by controlling the intensity of perturbation. Experiments conducted on the LibriSpeech dataset showed that the speaker attributes were obscured with their human perception preserved for 60.71% of the processed utterances. Audio samples can be found in . | en_US |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 4443-4447 | en_US |
dcterms.issued | 2024 | - |
dc.identifier.scopus | 2-s2.0-85214832957 | - |
dc.description.validate | 202508 bcch | en_US |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | OA_Others | - |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | The National Natural Science Foundation of China under Grant U23B2053 | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | VoR allowed | en_US |
Appears in Collections: | Conference Paper |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
wang24ha_interspeech.pdf | 477.95 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.