Mind the gap between prototypes and images in cross-domain finetuning

Tian, H; Liu, F; Zhou, Z; Liu, T; Zhang, C; Han, B

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114015

DC Field	Value	Language
dc.contributor	Department of Data Science and Artificial Intelligence	en_US
dc.creator	Tian, H	en_US
dc.creator	Liu, F	en_US
dc.creator	Zhou, Z	en_US
dc.creator	Liu, T	en_US
dc.creator	Zhang, C	en_US
dc.creator	Han, B	en_US
dc.date.accessioned	2025-07-10T01:31:38Z	-
dc.date.available	2025-07-10T01:31:38Z	-
dc.identifier.uri	http://hdl.handle.net/10397/114015	-
dc.description	NeurIPS 2024: The Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, 10-15 Dec 2024	en_US
dc.language.iso	en	en_US
dc.publisher	NeurIPS	en_US
dc.rights	Posted with permission of the author.	en_US
dc.title	Mind the gap between prototypes and images in cross-domain finetuning	en_US
dc.type	Conference Paper	en_US
dc.identifier.volume	37	en_US
dcterms.abstract	In cross-domain few-shot classification (CFC), recent works mainly focus on adapting a simple transformation head on top of a frozen pre-trained backbone with few labeled data to project embeddings into a task-specific metric space where classification can be performed by measuring similarities between image instance and prototype representations. Technically, an assumption implicitly adopted in such a framework is that the prototype and image instance embeddings share the same representation transformation. However, in this paper, we find that there naturally exists a gap, which resembles the modality gap, between the prototype and image instance embeddings extracted from the frozen pre-trained backbone, and simply applying the same transformation during the adaptation phase constrains exploring the optimal representation distributions and shrinks the gap between prototype and image representations. To solve this problem, we propose a simple yet effective method, contrastive prototype-image adaptation (CoPA), to adapt different transformations for prototypes and images similarly to CLIP by treating prototypes as text prompts. Extensive experiments on Meta-Dataset demonstrate that CoPA achieves the state-of-the-art performance more efficiently. Meanwhile, further analyses also indicate that CoPA can learn better representation clusters, enlarge the gap, and achieve the minimum validation loss at the enlarged gap.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Advances in neural information processing systems, 2024, v. 37	en_US
dcterms.isPartOf	Advances in neural information processing systems	en_US
dcterms.issued	2024	-
dc.relation.conference	Conference on Neural Information Processing Systems [NeurIPS]	en_US
dc.description.validate	202507 bcwh	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	a3866	-
dc.identifier.SubFormID	51472	-
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Copyright retained by author	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
Tian_Mind_Gap_Prototypes.pdf		2.02 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Google ScholarTM

Google Scholar^TM