Covariate balancing for high-dimensional samples in controlled experiments

Luo, X; Yan, P; Yan, R; Wang, S

doi:10.1080/01605682.2024.2423362

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/113374

DC Field	Value	Language
dc.contributor	Department of Logistics and Maritime Studies	en_US
dc.contributor	Department of Aeronautical and Aviation Engineering	en_US
dc.creator	Luo, X	en_US
dc.creator	Yan, P	en_US
dc.creator	Yan, R	en_US
dc.creator	Wang, S	en_US
dc.date.accessioned	2025-06-04T01:34:24Z	-
dc.date.available	2025-06-04T01:34:24Z	-
dc.identifier.issn	0160-5682	en_US
dc.identifier.uri	http://hdl.handle.net/10397/113374	-
dc.language.iso	en	en_US
dc.publisher	Taylor & Francis	en_US
dc.rights	© 2024 The Operational Research Society	en_US
dc.rights	This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of the operational research society on 05 Nov 2024 (published online), available at: https://doi.org/10.1080/01605682.2024.2423362.	en_US
dc.subject	Controlled experiment	en_US
dc.subject	Covariate balance	en_US
dc.subject	Experiment design	en_US
dc.subject	High-dimensional samples	en_US
dc.subject	Partitioning problem	en_US
dc.title	Covariate balancing for high-dimensional samples in controlled experiments	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	1584	en_US
dc.identifier.epage	1598	en_US
dc.identifier.volume	76	en_US
dc.identifier.issue	8	en_US
dc.identifier.doi	10.1080/01605682.2024.2423362	en_US
dcterms.abstract	In controlled experiments, achieving covariate balancing across all groups is crucial as it ensures that the estimated treatment effects are not confounded by the effects of covariates. This study proposes a mixed-integer nonlinear programming model to address the covariate balancing problem. Specifically, we introduce a new covariate imbalance measure, which is the maximum discrepancy in both the first and second central moments between any two groups. The second central moment can effectively capture the correlation of covariates in a physical sense, which is crucial for partitioning high-dimensional samples. A mixed-integer nonlinear programming model is constructed to minimize the proposed measure to obtain the optimal partitioning results. The nonlinear model is then linearized to accelerate the optimization process. We conduct computational experiments based on simulated datasets, including one-dimensional, two-dimensional, and three-dimensional Gaussian distributed samples, and a real clinic trial dataset. Compared to the conventional discrepancy-based method, our method achieves a 54.81% and a 40.6% reduction in the maximum discrepancy of partitioning results in the two-dimensional simulated Gaussian samples and the real clinic trial dataset, respectively. These results demonstrate the superiority of the proposed model in partitioning high-dimensional samples with correlated covariates compared with the conventional discrepancy-based method.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Journal of the Operational Research Society, 2025, v. 76, no. 8, p. 1584-1598	en_US
dcterms.isPartOf	Journal of the Operational Research Society	en_US
dcterms.issued	2025	-
dc.identifier.scopus	2-s2.0-85208470061	-
dc.identifier.eissn	1476-9360	en_US
dc.description.validate	202506 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a3629a	-
dc.identifier.SubFormID	50512	-
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Covariate balancing for high-dimensional.pdf	Pre-Published version	2.86 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM