Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/101828
Title: | An innovative approach of determining the sample data size for machine learning models : a case study on health and safety management for infrastructure workers | Authors: | Wang, H Yi, W Liu, Y |
Issue Date: | 2022 | Source: | Electronic Research Archive, 2022, v. 30, no. 9, p. 3452-3462 | Abstract: | Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance. | Keywords: | Health and safety management Learning curve Machine learning Sample size Transportation infrastructure |
Publisher: | American Institute of Mathematical Sciences | Journal: | Electronic research archive | EISSN: | 2688-1594 | DOI: | 10.3934/era.2022176 | Rights: | © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0) The following publication Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: A case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452-3462 is available at https://doi.org/10.3934/era.2022176. |
Appears in Collections: | Journal/Magazine Article |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
era-30-09-176.pdf | 24.2 MB | Adobe PDF | View/Open |
Page views
95
Citations as of May 11, 2025
Downloads
111
Citations as of May 11, 2025
SCOPUSTM
Citations
7
Citations as of Jun 6, 2025
WEB OF SCIENCETM
Citations
7
Citations as of Jun 5, 2025

Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.