Vision-language model-based human-robot collaboration for smart manufacturing : a state-of-the-art survey

Fan, J; Yin, Y; Wang, T; Dong, W; Zheng, P; Wang, L

doi:10.1007/s42524-025-4136-9

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/112574

Title:	Vision-language model-based human-robot collaboration for smart manufacturing : a state-of-the-art survey
Authors:	Fan, J Yin, Y Wang, T Dong, W Zheng, P Wang, L
Issue Date:	Mar-2025
Source:	Frontiers of engineering management, Mar. 2025, v. 12, no. 1, p. 177-200
Abstract:	Human-robot collaboration (HRC) is set to transform the manufacturing paradigm by leveraging the strengths of human flexibility and robot precision. The recent breakthrough of Large Language Models (LLMs) and Vision-Language Models (VLMs) has motivated the preliminary explorations and adoptions of these models in the smart manufacturing field. However, despite the considerable amount of effort, existing research mainly focused on individual components without a comprehensive perspective to address the full potential of VLMs, especially for HRC in smart manufacturing scenarios. To fill the gap, this work offers a systematic review of the latest advancements and applications of VLMs in HRC for smart manufacturing, which covers the fundamental architectures and pretraining methodologies of LLMs and VLMs, their applications in robotic task planning, navigation, and manipulation, and role in enhancing human–robot skill transfer through multimodal data integration. Lastly, the paper discusses current limitations and future research directions in VLM-based HRC, highlighting the trend in fully realizing the potential of these technologies for smart manufacturing.
Keywords:	Human–robot collaboration Large language odels Smart manufacturing Vision-language models
Publisher:	Higher Education Press
Journal:	Frontiers of engineering management
ISSN:	2095-7513
EISSN:	2096-0255
DOI:	10.1007/s42524-025-4136-9
Rights:	© The Author(s) 2024. This article is published with open access at link.springer.com and journal. hep.com.cn This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The following publication Fan, J., Yin, Y., Wang, T. et al. Vision-language model-based human-robot collaboration for smart manufacturing: A state-of-the-art survey. Front. Eng. Manag. 12, 177–200 (2025) is available at https://doi.org/10.1007/s42524-025-4136-9.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
s42524-025-4136-9.pdf		3.08 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show full item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM