Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/114432
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Applied Mathematicsen_US
dc.creatorWei, Xen_US
dc.creatorYu, Xen_US
dc.date.accessioned2025-08-06T09:12:14Z-
dc.date.available2025-08-06T09:12:14Z-
dc.identifier.issn0095-4616en_US
dc.identifier.urihttp://hdl.handle.net/10397/114432-
dc.language.isoenen_US
dc.publisherSpringer New York LLCen_US
dc.rights© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024en_US
dc.rightsThis version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/s00245-024-10205-7.en_US
dc.subjectContinuous time reinforcement learningen_US
dc.subjectIntegrated q-functionen_US
dc.subjectMean-field controlen_US
dc.subjectTest policiesen_US
dc.subjectWeak martingale characterizationen_US
dc.titleContinuous time q-learning for mean-field control problemsen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume91en_US
dc.identifier.issue1en_US
dc.identifier.doi10.1007/s00245-024-10205-7en_US
dcterms.abstractThis paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (J Mach Learn Res 24:1–61, 2023), for continuous time mean-field control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent’s control problem in Jia and Zhou (J Mach Learn Res 24:1–61, 2023), we reveal that two different q-functions naturally arise in mean-field control problems: (i) the integrated q-function (denoted by q) as the first-order approximation of the integrated Q-function introduced in Gu et al. (Oper Res 71(4):1040–1054, 2023), which can be learnt by a weak martingale condition using all test policies; and (ii) the essential q-function (denoted by qe) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation. Based on the weak martingale condition and our proposed searching method of test policies, some model-free learning algorithms are devised. In two examples, one in LQ control framework and one beyond LQ control framework, we can obtain the exact parameterization of the optimal value function and q-functions and illustrate our algorithms with simulation experiments.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationApplied mathematics and optimization, Feb. 2025, v. 91, no. 1, 10en_US
dcterms.isPartOfApplied mathematics and optimizationen_US
dcterms.issued2025-02-
dc.identifier.scopus2-s2.0-85212399218-
dc.identifier.eissn1432-0606en_US
dc.identifier.artn10en_US
dc.description.validate202508 bcchen_US
dc.description.oaAccepted Manuscripten_US
dc.identifier.FolderNumbera3961-
dc.identifier.SubFormID51836-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextThe Hong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
Wei_Continuous_Time_Q-Learning.pdfPre-Published version1.95 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Final Accepted Manuscript
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

SCOPUSTM   
Citations

4
Citations as of Apr 3, 2026

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.