Parameter-efficient fine-tuning of speaker-aware dynamic prompts for speaker verification

Li, Z; Mak, MW; Lee, HY; Meng, H

doi:10.21437/Interspeech.2024-295

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114607

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	-
dc.creator	Li, Z	-
dc.creator	Mak, MW	-
dc.creator	Lee, HY	-
dc.creator	Meng, H	-
dc.date.accessioned	2025-08-18T03:02:10Z	-
dc.date.available	2025-08-18T03:02:10Z	-
dc.identifier.uri	http://hdl.handle.net/10397/114607	-
dc.description	Interspeech 2024, 1-5 September 2024, Kos, Greece	en_US
dc.language.iso	en	en_US
dc.publisher	International Speech Communication Association	en_US
dc.rights	The following publication Li, Z., Mak, M.-w., Lee, H.-y., Meng, H. (2024) Parameter-efficient Fine-tuning of Speaker-Aware Dynamic Prompts for Speaker Verification. Proc. Interspeech 2024, 2675-2679 is available at https://doi.org/10.21437/Interspeech.2024-295.	en_US
dc.subject	Parameter-efficient tuning	en_US
dc.subject	Pre-trained Transformer	en_US
dc.subject	Prompt pool	en_US
dc.subject	Prompt tuning	en_US
dc.subject	Speaker verification	en_US
dc.title	Parameter-efficient fine-tuning of speaker-aware dynamic prompts for speaker verification	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	2675	-
dc.identifier.epage	2679	-
dc.identifier.doi	10.21437/Interspeech.2024-295	-
dcterms.abstract	Prompt tuning can effectively reduce tunable parameters in pre-trained Transformers. However, it is weak at capturing speaker traits because the prompts can easily overfit the adaptation utterances, resulting in poor generalization to unseen speakers. This paper introduces a prompt pool comprising learnable prompts to tackle this issue. Unlike the traditional method that learns a fixed set of prompts for each training utterance, our method uses a dynamic selection strategy to select the best matching prompts in a pool for tuning, resulting in each prompt being tuned by its closely matched speaker. The objective is to make the prompts in the pool form speaker clusters, enhancing speaker prediction in the downstream classifier while maintaining the plasticity of the pre-trained Transformers. Our experiments on language mismatch in speaker verification demonstrate that the dynamic prompt pool provides a memory- and computation-efficient solution to fine-tune pre-trained Transformers.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2024, p. 2675-2679	-
dcterms.issued	2024	-
dc.identifier.scopus	2-s2.0-85214797325	-
dc.description.validate	202508 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Others	en_US
dc.description.fundingSource	RGC	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	VoR allowed	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
li24e_interspeech.pdf		610.23 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM