MOTS : minimax optimal Thompson sampling

Jin, T; Xu, P; Shi, J; Xiao, X; Gu, Q

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/105484

DC Field	Value	Language
dc.contributor	Department of Computing	en_US
dc.creator	Jin, T	en_US
dc.creator	Xu, P	en_US
dc.creator	Shi, J	en_US
dc.creator	Xiao, X	en_US
dc.creator	Gu, Q	en_US
dc.date.accessioned	2024-04-15T07:34:38Z	-
dc.date.available	2024-04-15T07:34:38Z	-
dc.identifier.issn	2640-3498	en_US
dc.identifier.uri	http://hdl.handle.net/10397/105484	-
dc.language.iso	en	en_US
dc.publisher	PMLR web site	en_US
dc.rights	Copyright 2021 by the author(s).	en_US
dc.rights	Posted with permission of the author.	en_US
dc.rights	The following publication Tianyuan Jin, Pan Xu, Jieming Shi, Xiaokui Xiao, Quanquan Gu Proceedings of the 38th International Conference on Machine Learning, PMLR 139:5074-5083, 2021 is available at https://proceedings.mlr.press/v139/jin21d.html.	en_US
dc.title	MOTS : minimax optimal Thompson sampling	en_US
dc.type	Conference Paper	en_US
dc.identifier.spage	5074	en_US
dc.identifier.epage	5083	en_US
dc.identifier.volume	139	en_US
dcterms.abstract	Thompson sampling is one of the most widely used algorithms in many online decision problems due to its simplicity for implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it has remained an open problem whether Thompson sampling can achieve the minimax optimal regret O(\sqrt{TK}) for K-armed bandit problems, where T is the total time horizon. In this paper we fill this long open gap by proposing a new Thompson sampling algorithm called MOTS that adaptively truncates the sampling result of the chosen arm at each time step. We prove that this simple variant of Thompson sampling achieves the minimax optimal regret bound O(\sqrt{TK}) for finite time horizon T and also the asymptotic optimal regret bound when T grows to infinity as well. This is the first time that the minimax optimality of multi-armed bandit problems has been attained by Thompson sampling type of algorithms.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Proceedings of Machine Learning Research, 2021, v. 139, p. 5074-5083	en_US
dcterms.isPartOf	Proceedings of Machine Learning Research	en_US
dcterms.issued	2021	-
dc.relation.conference	International Conference on Machine Learning [ICML]	en_US
dc.description.validate	202402 bcch	en_US
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	COMP-0137	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	Hong Kong Polytechnic University	en_US
dc.description.pubStatus	Published	en_US
dc.identifier.OPUS	50641174	-
dc.description.oaCategory	Copyright retained by author	en_US
Appears in Collections:	Conference Paper

Files in This Item:

File	Description	Size	Format
jin21d.pdf		3.06 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

77

Last Week
3

Last month

Citations as of Nov 9, 2025

Downloads

21

Citations as of Nov 9, 2025

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

Google ScholarTM

Google Scholar^TM