H2R bridge : transferring vision-language models to few-shot intention meta-perception in human robot collaboration

Wu, D; Zhao, Q; Fan, J; Qi, J; Zheng, P; Hu, J

doi:10.1016/j.jmsy.2025.03.016

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/116348

DC Field	Value	Language
dc.contributor	Department of Industrial and Systems Engineering	en_US
dc.creator	Wu, D	en_US
dc.creator	Zhao, Q	en_US
dc.creator	Fan, J	en_US
dc.creator	Qi, J	en_US
dc.creator	Zheng, P	en_US
dc.creator	Hu, J	en_US
dc.date.accessioned	2025-12-18T06:39:42Z	-
dc.date.available	2025-12-18T06:39:42Z	-
dc.identifier.issn	0278-6125	en_US
dc.identifier.uri	http://hdl.handle.net/10397/116348	-
dc.language.iso	en	en_US
dc.publisher	Elsevier	en_US
dc.subject	Few-shot learning	en_US
dc.subject	Human–robot collaboration	en_US
dc.subject	Intent recognition	en_US
dc.subject	Vision-language models	en_US
dc.title	H2R bridge : transferring vision-language models to few-shot intention meta-perception in human robot collaboration	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	524	en_US
dc.identifier.epage	535	en_US
dc.identifier.volume	80	en_US
dc.identifier.doi	10.1016/j.jmsy.2025.03.016	en_US
dcterms.abstract	Human–robot collaboration enhances efficiency by enabling robots to work alongside human operators in shared tasks. Accurately understanding human intentions is critical for achieving a high level of collaboration. Existing methods heavily rely on case-specific data and face challenges with new tasks and unseen categories, while often limited data is available under real-world conditions. To bolster the proactive cognitive abilities of collaborative robots, this work introduces a Visual-Language-Temporal approach, conceptualizing intent recognition as a multimodal learning problem with HRC-oriented prompts. A large model with prior knowledge is fine-tuned to acquire industrial domain expertise, then enables efficient rapid transfer through few-shot learning in data-scarce scenarios. Comparisons with state-of-the-art methods across various datasets demonstrate the proposed approach achieves new benchmarks. Ablation studies confirm the efficacy of the multimodal framework, and few-shot experiments further underscore meta-perceptual potential. This work addresses the challenges of perceptual data and training costs, building a human–robot bridge (H2R Bridge) for semantic communication, and is expected to facilitate proactive HRC and further integration of large models in industrial applications.	en_US
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Journal of manufacturing systems, June 2025, v. 80, p. 524-535	en_US
dcterms.isPartOf	Journal of manufacturing systems	en_US
dcterms.issued	2025-06	-
dc.identifier.scopus	2-s2.0-105001851845	-
dc.description.validate	202512 bchy	en_US
dc.description.oa	Not applicable	en_US
dc.identifier.SubFormID	G000494/2025-12	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work is supported by the National Natural Science Foundation of China (Grant Nos. U23B20102 , 52475270 , 52375254 ) and Xie Youbai Design Scientific Research Foundation ( XYB-DS-202401 ).	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2027-06-30	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2027-06-30

Access

View full-text via PolyU eLinks

Show simple item record

SCOPUS^TM
Citations

10

Citations as of Apr 3, 2026

Google Scholar^TM

Check

Open Access Information

Access

SCOPUSTM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM