Continuous time q-learning for mean-field control problems

Wei, X; Yu, X

doi:10.1007/s00245-024-10205-7

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114432

DC Field	Value	Language
dc.contributor	Department of Applied Mathematics	-
dc.creator	Wei, X	-
dc.creator	Yu, X	-
dc.date.accessioned	2025-08-06T09:12:14Z	-
dc.date.available	2025-08-06T09:12:14Z	-
dc.identifier.issn	0095-4616	-
dc.identifier.uri	http://hdl.handle.net/10397/114432	-
dc.language.iso	en	en_US
dc.publisher	Springer New York LLC	en_US
dc.subject	Continuous time reinforcement learning	en_US
dc.subject	Integrated q-function	en_US
dc.subject	Mean-field control	en_US
dc.subject	Test policies	en_US
dc.subject	Weak martingale characterization	en_US
dc.title	Continuous time q-learning for mean-field control problems	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	91	-
dc.identifier.issue	1	-
dc.identifier.doi	10.1007/s00245-024-10205-7	-
dcterms.abstract	This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by qe) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.	-
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Applied mathematics and optimization, Feb. 2025, v. 91, no. 1, 10	-
dcterms.isPartOf	Applied mathematics and optimization	-
dcterms.issued	2025-02	-
dc.identifier.scopus	2-s2.0-85212399218	-
dc.identifier.eissn	1432-0606	-
dc.identifier.artn	10	-
dc.description.validate	202508 bcch	-
dc.identifier.FolderNumber	a3961	en_US
dc.identifier.SubFormID	51836	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	The Hong Kong Polytechnic University	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2025-12-17	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2025-12-17

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM