Please use this identifier to cite or link to this item:
http://hdl.handle.net/10397/101828
DC Field | Value | Language |
---|---|---|
dc.contributor | Department of Logistics and Maritime Studies | - |
dc.contributor | Department of Building and Real Estate | - |
dc.creator | Wang, H | en_US |
dc.creator | Yi, W | en_US |
dc.creator | Liu, Y | en_US |
dc.date.accessioned | 2023-09-18T07:45:01Z | - |
dc.date.available | 2023-09-18T07:45:01Z | - |
dc.identifier.uri | http://hdl.handle.net/10397/101828 | - |
dc.language.iso | en | en_US |
dc.publisher | American Institute of Mathematical Sciences | en_US |
dc.rights | © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0) | en_US |
dc.rights | The following publication Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: A case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452-3462 is available at https://doi.org/10.3934/era.2022176. | en_US |
dc.subject | Health and safety management | en_US |
dc.subject | Learning curve | en_US |
dc.subject | Machine learning | en_US |
dc.subject | Sample size | en_US |
dc.subject | Transportation infrastructure | en_US |
dc.title | An innovative approach of determining the sample data size for machine learning models : a case study on health and safety management for infrastructure workers | en_US |
dc.type | Journal/Magazine Article | en_US |
dc.identifier.spage | 3452 | en_US |
dc.identifier.epage | 3462 | en_US |
dc.identifier.volume | 30 | en_US |
dc.identifier.issue | 9 | en_US |
dc.identifier.doi | 10.3934/era.2022176 | en_US |
dcterms.abstract | Numerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance. | - |
dcterms.accessRights | open access | en_US |
dcterms.bibliographicCitation | Electronic Research Archive, 2022, v. 30, no. 9, p. 3452-3462 | en_US |
dcterms.isPartOf | Electronic research archive | en_US |
dcterms.issued | 2022 | - |
dc.identifier.scopus | 2-s2.0-85135388816 | - |
dc.identifier.eissn | 2688-1594 | en_US |
dc.description.validate | 202309 bcvc | - |
dc.description.oa | Version of Record | en_US |
dc.identifier.FolderNumber | OA_Scopus/WOS | - |
dc.description.fundingSource | Others | en_US |
dc.description.fundingText | Hong Kong Polytechnic University | en_US |
dc.description.pubStatus | Published | en_US |
dc.description.oaCategory | CC | en_US |
Appears in Collections: | Journal/Magazine Article |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
era-30-09-176.pdf | 24.2 MB | Adobe PDF | View/Open |
Page views
95
Citations as of May 11, 2025
Downloads
111
Citations as of May 11, 2025
SCOPUSTM
Citations
7
Citations as of Jun 6, 2025
WEB OF SCIENCETM
Citations
7
Citations as of Jun 5, 2025

Google ScholarTM
Check
Altmetric
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.