Improved regression tree models using generalization error-based splitting criteria

Yang, Y; Wang, S; Laporte, G

doi:10.1002/nav.22270

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/115607

DC Field	Value	Language
dc.contributor	Department of Logistics and Maritime Studies	-
dc.creator	Yang, Y	-
dc.creator	Wang, S	-
dc.creator	Laporte, G	-
dc.date.accessioned	2025-10-08T01:17:00Z	-
dc.date.available	2025-10-08T01:17:00Z	-
dc.identifier.issn	0894-069X	-
dc.identifier.uri	http://hdl.handle.net/10397/115607	-
dc.language.iso	en	en_US
dc.publisher	John Wiley & Sons, Inc.	en_US
dc.rights	This is an open access article under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits use, distribution and reproduction in any medium, provided the original work is properly cited.	en_US
dc.rights	© 2025 The Author(s). Naval Research Logistics published by Wiley Periodicals LLC.	en_US
dc.rights	The following publication Yang, Y., Wang, S. and Laporte, G. (2025), Improved Regression Tree Models Using Generalization Error-Based Splitting Criteria. Naval Research Logistics is available at https://doi.org/10.1002/nav.22270.	en_US
dc.subject	Generalization error	en_US
dc.subject	Leave-one-out cross-validation	en_US
dc.subject	Mean squared error	en_US
dc.subject	Regression tree	en_US
dc.title	Improved regression tree models using generalization error-based splitting criteria	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.doi	10.1002/nav.22270	-
dcterms.abstract	Despite the widespread application of machine learning (ML) approaches such as the regression tree (RT) in the field of data-driven optimization, overfitting may impair the effectiveness of ML models and thus hinder the deployment of ML for decision-making. In particular, we address the overfitting issue of the traditional RT splitting criterion with a limited sample size, which considers only the training mean squared error, and we accurately specify the mathematical formula for the generalization error. We introduce two novel splitting criteria based on generalization error, which offer higher-quality approximations of the generalization error than the traditional training error does. One criterion is formulated through a mathematical derivation based on the RT model, and the second is established through leave-one-out cross-validation (LOOCV). We construct RT models using our proposed generalization error-based splitting criteria from extensive ML benchmark instances and report the experimental results, including the models' computational efficiency, prediction accuracy, and robustness. Our findings endorse the superior efficacy and robustness of the RT model based on the refined LOOCV-informed splitting criterion, marking substantial improvements over those of the traditional RT model. Additionally, our tree structure analysis provides insights into how our proposed LOOCV-informed splitting criterion guides the model in striking a balance between a complex tree structure and accurate predictions.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Naval research logistics, First published: 10 June 2025, Early View, https://doi.org/10.1002/nav.22270	-
dcterms.isPartOf	Naval research logistics	-
dcterms.issued	2025	-
dc.identifier.scopus	2-s2.0-105007764781	-
dc.identifier.eissn	1520-6750	-
dc.description.validate	202510 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_TA	en_US
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Early release	en_US
dc.description.TA	Wiley (2025)	en_US
dc.description.oaCategory	TA	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Yang_Improved_Regression_Tree.pdf		662.39 kB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Google ScholarTM

Altmetric

Google Scholar^TM