Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/116348
| DC Field | Value | Language |
|---|---|---|
| dc.contributor | Department of Industrial and Systems Engineering | en_US |
| dc.creator | Wu, D | en_US |
| dc.creator | Zhao, Q | en_US |
| dc.creator | Fan, J | en_US |
| dc.creator | Qi, J | en_US |
| dc.creator | Zheng, P | en_US |
| dc.creator | Hu, J | en_US |
| dc.date.accessioned | 2025-12-18T06:39:42Z | - |
| dc.date.available | 2025-12-18T06:39:42Z | - |
| dc.identifier.issn | 0278-6125 | en_US |
| dc.identifier.uri | http://hdl.handle.net/10397/116348 | - |
| dc.language.iso | en | en_US |
| dc.publisher | Elsevier | en_US |
| dc.subject | Few-shot learning | en_US |
| dc.subject | Human–robot collaboration | en_US |
| dc.subject | Intent recognition | en_US |
| dc.subject | Vision-language models | en_US |
| dc.title | H2R bridge : transferring vision-language models to few-shot intention meta-perception in human robot collaboration | en_US |
| dc.type | Journal/Magazine Article | en_US |
| dc.identifier.spage | 524 | en_US |
| dc.identifier.epage | 535 | en_US |
| dc.identifier.volume | 80 | en_US |
| dc.identifier.doi | 10.1016/j.jmsy.2025.03.016 | en_US |
| dcterms.abstract | Human–robot collaboration enhances efficiency by enabling robots to work alongside human operators in shared tasks. Accurately understanding human intentions is critical for achieving a high level of collaboration. Existing methods heavily rely on case-specific data and face challenges with new tasks and unseen categories, while often limited data is available under real-world conditions. To bolster the proactive cognitive abilities of collaborative robots, this work introduces a Visual-Language-Temporal approach, conceptualizing intent recognition as a multimodal learning problem with HRC-oriented prompts. A large model with prior knowledge is fine-tuned to acquire industrial domain expertise, then enables efficient rapid transfer through few-shot learning in data-scarce scenarios. Comparisons with state-of-the-art methods across various datasets demonstrate the proposed approach achieves new benchmarks. Ablation studies confirm the efficacy of the multimodal framework, and few-shot experiments further underscore meta-perceptual potential. This work addresses the challenges of perceptual data and training costs, building a human–robot bridge (H2R Bridge) for semantic communication, and is expected to facilitate proactive HRC and further integration of large models in industrial applications. | en_US |
| dcterms.accessRights | embargoed access | en_US |
| dcterms.bibliographicCitation | Journal of manufacturing systems, June 2025, v. 80, p. 524-535 | en_US |
| dcterms.isPartOf | Journal of manufacturing systems | en_US |
| dcterms.issued | 2025-06 | - |
| dc.identifier.scopus | 2-s2.0-105001851845 | - |
| dc.description.validate | 202512 bchy | en_US |
| dc.description.oa | Not applicable | en_US |
| dc.identifier.SubFormID | G000494/2025-12 | - |
| dc.description.fundingSource | Others | en_US |
| dc.description.fundingText | This work is supported by the National Natural Science Foundation of China (Grant Nos. U23B20102 , 52475270 , 52375254 ) and Xie Youbai Design Scientific Research Foundation ( XYB-DS-202401 ). | en_US |
| dc.description.pubStatus | Published | en_US |
| dc.date.embargo | 2027-06-30 | en_US |
| dc.description.oaCategory | Green (AAM) | en_US |
| Appears in Collections: | Journal/Magazine Article | |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.



