Homotopic reinforcement learning for distributed consensus control of stochastic Markov jump multi-agent systems

Yao, Z; Zhu, Q; Qin, P; Luo, M

doi:10.1007/s11227-025-07885-5

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/118214

DC Field	Value	Language
dc.contributor	Department of Civil and Environmental Engineering	-
dc.creator	Yao, Z	-
dc.creator	Zhu, Q	-
dc.creator	Qin, P	-
dc.creator	Luo, M	-
dc.date.accessioned	2026-03-23T07:41:31Z	-
dc.date.available	2026-03-23T07:41:31Z	-
dc.identifier.issn	0920-8542	-
dc.identifier.uri	http://hdl.handle.net/10397/118214	-
dc.language.iso	en	en_US
dc.publisher	Springer	en_US
dc.subject	Distributed consensus	en_US
dc.subject	Multi-agent systems	en_US
dc.subject	Reinforcement learning	en_US
dc.subject	Stochastic Markov jump systems	en_US
dc.title	Homotopic reinforcement learning for distributed consensus control of stochastic Markov jump multi-agent systems	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	81	-
dc.identifier.issue	15	-
dc.identifier.doi	10.1007/s11227-025-07885-5	-
dcterms.abstract	This paper investigates the optimized consensus problem for a class of stochastic Markov jump multi-agent systems, with particular attention to the high computational demands that arise in large-scale implementations. Firstly, an error system is constructed based on the consensus objective, and a min-max strategy is introduced to transform the consensus problem into an optimized control problem of the error system. Subsequently, a set of parallel coupled game Lyapunov equations are developed to design the consensus controller, whose solution naturally requires significant parallel computation resources. Furthermore, to address the challenges posed by unknown system dynamics and the difficulty of obtaining an initial stable controller, a novel model-free consensus control approach based on homotopic reinforcement learning is proposed. By collecting state and input data, the proposed method enables the online computation of closed-loop stable controllers and optimized consensus controllers in a scalable and distributed manner. Finally, a numerical example is presented to demonstrate the effectiveness of the proposed approach, highlighting its suitability for real-time implementation on high-performance computing platforms.	-
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Journal of supercomputing, Oct. 2025, v. 81, no. 15, 1446	-
dcterms.isPartOf	Journal of supercomputing	-
dcterms.issued	2025-10	-
dc.identifier.scopus	2-s2.0-105018669109	-
dc.identifier.artn	1446	-
dc.description.validate	202603 bcjz	-
dc.description.oa	Not applicable	en_US
dc.identifier.SubFormID	G001297/2026-02	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported by Youth Talent Project of Scientific Research Program of Hubei Provincial Department of Education under Grant Q20241809 and Doctoral Scientific Research Foundation of Hubei University of Automotive Technology under Grant BK202404.	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2026-10-30	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2026-10-30

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM