MARL-based cooperative transit signal priority for the arterial road to reduce schedule delay

Long, M; Wang, R; Chen, J; Chung, E; Oguchi, T

doi:10.1080/21680566.2025.2564703

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/116640

DC Field	Value	Language
dc.contributor	Department of Electrical and Electronic Engineering	en_US
dc.creator	Long, M	en_US
dc.creator	Wang, R	en_US
dc.creator	Chen, J	en_US
dc.creator	Chung, E	en_US
dc.creator	Oguchi, T	en_US
dc.date.accessioned	2026-01-08T08:24:51Z	-
dc.date.available	2026-01-08T08:24:51Z	-
dc.identifier.issn	2168-0566	en_US
dc.identifier.uri	http://hdl.handle.net/10397/116640	-
dc.language.iso	en	en_US
dc.publisher	Taylor & Francis	en_US
dc.subject	Arterial road	en_US
dc.subject	Multi-agent reinforcement learning	en_US
dc.subject	Traffic signal control	en_US
dc.subject	Transit signal priority	en_US
dc.title	MARL-based cooperative transit signal priority for the arterial road to reduce schedule delay	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	13	en_US
dc.identifier.issue	1	en_US
dc.identifier.doi	10.1080/21680566.2025.2564703	en_US
dcterms.abstract	Transit signal priority (TSP) is an effective strategy to reduce transit delays and improve intersection efficiency. This paper introduces a Cooperative TSP strategy of Variable phase (CTSPV) using multi-agent reinforcement learning (MARL) to minimize transit schedule delays on arterial roads. The agents adjust phase sequences and durations based on real-time traffic, balancing transit and non-transit vehicle needs, resolving conflicting bus requests, and ensuring agent cooperation. Invalid action masking ensures compliance with green time and phase-skipping rules. Simulation results show CTSPV reduces person delay, queue lengths, and lateness by 8.7%, 31.6%, and 17.0%, respectively, compared to fixed-time signals. Testing different green time constraints highlights the importance of proper restrictions for efficient learning. Analysis of CTSPV's signal timing reveals agents prioritize phases with high traffic demand and bus priority, skipping phases with lower demand. Evaluation results of generalized rule-based strategies based on those RL-derived patterns demonstrate the good performance of RL-learned knowledge.	en_US
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Transportmetrica. B, Transport dynamics, 2025, v. 13, no. 1, 2564703	en_US
dcterms.isPartOf	Transportmetrica. B, Transport dynamics	en_US
dcterms.issued	2025	-
dc.identifier.scopus	2-s2.0-105018775843	-
dc.identifier.eissn	2168-0582	en_US
dc.identifier.artn	2564703	en_US
dc.description.validate	202601 bcjz	en_US
dc.description.oa	Not applicable	en_US
dc.identifier.SubFormID	G000664/2025-11	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported by the Science and Technology Research Program of Chongqing Municipal Education Commission under [grant number KJQN202500504], the Foundation of Chongqing Normal University under [grant numbers 24XWB040 and 25XLB001], the General Program of Chongqing Natural Science Foundation under [grant CSTB2025NSCQ-GPX1008], and the Mainland-Hong Kong Joint Funding Scheme under [grant MHP/038/23].	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2026-10-08	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2026-10-08

Access

View full-text via PolyU eLinks

Show simple item record

SCOPUS^TM
Citations

1

Citations as of Apr 3, 2026

Google Scholar^TM

Check

Open Access Information

Access

SCOPUSTM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM