Offline reinforcement learning for badminton tactical decision-making

Liu, M; Tao, W; Huang, H

doi:10.1016/j.engappai.2025.113395

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/117295

DC Field	Value	Language
dc.contributor	Department of Aeronautical and Aviation Engineering	en_US
dc.creator	Liu, M	en_US
dc.creator	Tao, W	en_US
dc.creator	Huang, H	en_US
dc.date.accessioned	2026-02-10T03:43:45Z	-
dc.date.available	2026-02-10T03:43:45Z	-
dc.identifier.issn	0952-1976	en_US
dc.identifier.uri	http://hdl.handle.net/10397/117295	-
dc.language.iso	en	en_US
dc.publisher	Elsevier Ltd	en_US
dc.subject	Badminton	en_US
dc.subject	Offline policy evaluation	en_US
dc.subject	Offline reinforcement learning	en_US
dc.subject	Tactical decision-making	en_US
dc.title	Offline reinforcement learning for badminton tactical decision-making	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	164	en_US
dc.identifier.doi	10.1016/j.engappai.2025.113395	en_US
dcterms.abstract	Sports data mining is becoming increasingly vital in modern competitive sports, driven by the need for athletes to continuously enhance their performance. Traditional methods of analyzing sports data rely heavily on expert experience and manual effort, which can be inefficient and unreliable. With advancements in artificial intelligence (AI), sports data is now being processed autonomously, providing more quantitative insights and more comprehensive analysis. This paper focuses on the role of tactics in sports, particularly in badminton, and explores the potential of using AI to enhance badminton tactical decision-making. We investigate the application of offline reinforcement learning (Offline RL) to develop tactical policies from pre-collected datasets, addressing challenges including algorithm design and offline policy evaluation. Specifically, we propose a new variant of conservative Q-learning (CQL), tailored for the hybrid action space to train tactical policies using the integrated offline dataset Shuttle. To evaluate these policies, we develop a preference-based reward model that aligns with tactical preferences, offering an alternative to traditional offline policy evaluation methods. Our computer-based experimental results and analysis demonstrate that the proposed method achieves higher average rewards than all baseline methods and the behavior policy used for data collection. This underscores the potential of the proposed method to enhance badminton tactical decision-making and offer athletes more effective tactical recommendations. Code and data are available at https://github.com/Wenminggong/Offline_RL_for_Badminton.	en_US
dcterms.abstract	Graphical abstract: [Figure not available: see fulltext.]	en_US
dcterms.accessRights	embargoed access	en_US
dcterms.bibliographicCitation	Engineering applications of artificial intelligence, 15 Jan. 2026, v. 164, pt. B, 113395	en_US
dcterms.isPartOf	Engineering applications of artificial intelligence	en_US
dcterms.issued	2026-01-15	-
dc.identifier.scopus	2-s2.0-105023710217	-
dc.identifier.eissn	1873-6769	en_US
dc.identifier.artn	113395	en_US
dc.description.validate	202602 bcch	en_US
dc.description.oa	Not applicable	en_US
dc.identifier.SubFormID	G000920/2026-01	-
dc.description.fundingSource	Others	en_US
dc.description.fundingText	This work was supported by the Research Institute for Sports Science and Technology (RISports), The Hong Kong Polytechnic University.	en_US
dc.description.pubStatus	Published	en_US
dc.date.embargo	2028-01-15	en_US
dc.description.oaCategory	Green (AAM)	en_US
dc.relation.rdata	https://github.com/Wenminggong/Offline_RL_for_Badminton#	-
Appears in Collections:	Journal/Magazine Article

Open Access Information

Status	embargoed access
Embargo End Date	2028-01-15

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM