Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/101828
PIRA download icon_1.1View/Download Full Text
Title: An innovative approach of determining the sample data size for machine learning models : a case study on health and safety management for infrastructure workers
Authors: Wang, H 
Yi, W 
Liu, Y 
Issue Date: 2022
Source: Electronic Research Archive, 2022, v. 30, no. 9, p. 3452-3462
Abstract: Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance.
Keywords: Health and safety management
Learning curve
Machine learning
Sample size
Transportation infrastructure
Publisher: American Institute of Mathematical Sciences
Journal: Electronic research archive 
EISSN: 2688-1594
DOI: 10.3934/era.2022176
Rights: © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
The following publication Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: A case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452-3462 is available at https://doi.org/10.3934/era.2022176.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
era-30-09-176.pdf24.2 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

95
Citations as of May 11, 2025

Downloads

111
Citations as of May 11, 2025

SCOPUSTM   
Citations

7
Citations as of Jun 6, 2025

WEB OF SCIENCETM
Citations

7
Citations as of Jun 5, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.