Large language model-enhanced reinforcement learning for generic bus holding control strategies

Yu, J; Wang, Y; Ma, W

doi:10.1016/j.tre.2025.104142

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/116209

DC Field	Value	Language
dc.contributor	Department of Civil and Environmental Engineering	en_US
dc.creator	Yu, J	en_US
dc.creator	Wang, Y	en_US
dc.creator	Ma, W	en_US
dc.date.accessioned	2025-12-02T03:29:26Z	-
dc.date.available	2025-12-02T03:29:26Z	-
dc.identifier.issn	1366-5545	en_US
dc.identifier.uri	http://hdl.handle.net/10397/116209	-
dc.language.iso	en	en_US
dc.publisher	Pergamon Press	en_US
dc.subject	Bus bunching	en_US
dc.subject	Control strategy	en_US
dc.subject	Deep reinforcement learning	en_US
dc.subject	Dynamic holding	en_US
dc.subject	Large language model	en_US
dc.title	Large language model-enhanced reinforcement learning for generic bus holding control strategies	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	200	en_US
dc.identifier.doi	10.1016/j.tre.2025.104142	en_US
dcterms.abstract	Bus holding control is a widely-adopted strategy for maintaining stability and improving the operational efficiency of bus systems. Traditional model-based methods often face challenges with the low accuracy of bus state prediction and passenger demand estimation. In contrast, Reinforcement Learning (RL), as a data-driven approach, has demonstrated great potential in formulating bus holding strategies. RL determines the optimal control strategies in order to maximize the cumulative reward, which reflects the overall control goals. However, translating sparse and delayed control goals in real-world tasks into dense and real-time rewards for RL is challenging, normally requiring extensive manual trial-and-error. In view of this, this study introduces an automatic reward generation paradigm by leveraging the in-context learning and reasoning capabilities of Large Language Models (LLMs). This new paradigm, termed the LLM-enhanced RL, comprises several LLM-based modules: reward initializer, reward modifier, performance analyzer, and reward refiner. These modules cooperate to initialize and iteratively improve the reward function according to the feedback from training and test results for the specified RL-based task. Ineffective reward functions generated by the LLM are filtered out to ensure the stable evolution of the RL agents’ performance over iterations. To evaluate the feasibility of the proposed LLM-enhanced RL paradigm, it is applied to extensive bus holding control scenarios that vary in the number of bus lines, stops, and passenger demand. The results demonstrate the superiority, generalization capability, and robustness of the proposed paradigm compared to vanilla RL strategies, the LLM-based controller, physics-based feedback controllers, and optimization-based controllers. This study sheds light on the great potential of utilizing LLMs in various smart mobility applications.	en_US
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Transportation research. Part E, Logistics and transportation review, Aug. 2025, v. 200, 104142	en_US
dcterms.isPartOf	Transportation research. Part E, Logistics and transportation review	en_US
dcterms.issued	2025-08	-
dc.identifier.scopus	2-s2.0-105005938308	-
dc.identifier.eissn	1878-5794	en_US
dc.identifier.artn	104142	en_US
dc.description.validate	202512 bcjz	en_US
dc.description.oa	Not applicable	en_US
dc.identifier.SubFormID	G000407/2025-11	-
dc.description.fundingSource	RGC	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	The work described in this paper is supported by the Innovation and Technology Fund - Mainland-Hong Kong Joint Funding Scheme (ITF-MHKJFS) (Project No. MHP/150/22), grants from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. PolyU/15206322 and PolyU/15227424), and a grant from the Otto Poon Charitable Foundation Smart Cities Research Institute, The Hong Kong Polytechnic University (CD06). The contents of this article reflect the views of the authors, who are responsible for the facts and accuracy of the information presented herein.	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2028-08-31	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2028-08-31

Access

View full-text via PolyU eLinks

Show simple item record

SCOPUS^TM
Citations

4

Citations as of Apr 3, 2026

Google Scholar^TM

Check

Open Access Information

Access

SCOPUSTM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM