Continuous time q-learning for mean-field control problems

Wei, X; Yu, X

doi:10.1007/s00245-024-10205-7

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114432

Title:	Continuous time q-learning for mean-field control problems
Authors:	Wei, X Yu, X
Issue Date:	Feb-2025
Source:	Applied mathematics and optimization, Feb. 2025, v. 91, no. 1, 10
Abstract:	This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by qe) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.
Keywords:	Continuous time reinforcement learning Integrated q-function Mean-field control Test policies Weak martingale characterization
Publisher:	Springer New York LLC
Journal:	Applied mathematics and optimization
ISSN:	0095-4616
EISSN:	1432-0606
DOI:	10.1007/s00245-024-10205-7
Rights:	© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024 This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s00245-024-10205-7.
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Wei_Continuous_Time_Q-Learning.pdf	Pre-Published version	1.95 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show full item record

SCOPUS^TM
Citations

4

Citations as of Apr 3, 2026

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

SCOPUSTM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM