Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/105484
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorJin, Ten_US
dc.creatorXu, Pen_US
dc.creatorShi, Jen_US
dc.creatorXiao, Xen_US
dc.creatorGu, Qen_US
dc.date.accessioned2024-04-15T07:34:38Z-
dc.date.available2024-04-15T07:34:38Z-
dc.identifier.issn2640-3498en_US
dc.identifier.urihttp://hdl.handle.net/10397/105484-
dc.language.isoenen_US
dc.publisherPMLR web siteen_US
dc.rightsCopyright 2021 by the author(s).en_US
dc.rightsPosted with permission of the author.en_US
dc.rightsThe following publication Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5074-5083, 2021 is available at https://proceedings.mlr.press/v139/jin21d.html.en_US
dc.titleMOTS : minimax optimal Thompson samplingen_US
dc.typeConference Paperen_US
dc.identifier.spage5074en_US
dc.identifier.epage5083en_US
dc.identifier.volume139en_US
dcterms.abstractThompson sampling is one of the most widely used algorithms in many online decision problems due to its simplicity for implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can achieve the minimax optimal regret O(\sqrt{TK}) for K-armed bandit problems, where T is the total time horizon. In this paper we fill this long open gap by proposing a new Thompson sampling algorithm called MOTS that adaptively truncates the sampling result of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound O(\sqrt{TK}) for finite time horizon T and also the asymptotic optimal regret bound when T grows to infinity as well. This is the first time that the minimax optimality of multi-armed bandit problems has been attained by Thompson sampling type of algorithms.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationProceedings of Machine Learning Research, 2021, v. 139, p. 5074-5083en_US
dcterms.isPartOfProceedings of Machine Learning Researchen_US
dcterms.issued2021-
dc.relation.conferenceInternational Conference on Machine Learning [ICML]en_US
dc.description.validate202402 bcchen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberCOMP-0137-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextHong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.identifier.OPUS50641174-
dc.description.oaCategoryCopyright retained by authoren_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
jin21d.pdf3.06 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

41
Citations as of May 11, 2025

Downloads

10
Citations as of May 11, 2025

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.