Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/101828
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Logistics and Maritime Studies-
dc.contributorDepartment of Building and Real Estate-
dc.creatorWang, Hen_US
dc.creatorYi, Wen_US
dc.creatorLiu, Yen_US
dc.date.accessioned2023-09-18T07:45:01Z-
dc.date.available2023-09-18T07:45:01Z-
dc.identifier.urihttp://hdl.handle.net/10397/101828-
dc.language.isoenen_US
dc.publisherAmerican Institute of Mathematical Sciencesen_US
dc.rights© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)en_US
dc.rightsThe following publication Wang, H., Yi, W., & Liu, Y. (2022). An innovative approach of determining the sample data size for machine learning models: A case study on health and safety management for infrastructure workers. Electronic Research Archive, 30(9), 3452-3462 is available at https://doi.org/10.3934/era.2022176.en_US
dc.subjectHealth and safety managementen_US
dc.subjectLearning curveen_US
dc.subjectMachine learningen_US
dc.subjectSample sizeen_US
dc.subjectTransportation infrastructureen_US
dc.titleAn innovative approach of determining the sample data size for machine learning models : a case study on health and safety management for infrastructure workersen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.spage3452en_US
dc.identifier.epage3462en_US
dc.identifier.volume30en_US
dc.identifier.issue9en_US
dc.identifier.doi10.3934/era.2022176en_US
dcterms.abstractNumerical experiment is an essential part of academic studies in the field of transportation management. Using the appropriate sample size to conduct experiments can save both the data collecting cost and computing time. However, few studies have paid attention to determining the sample size. In this research, we use four typical regression models in machine learning and a dataset from transport infrastructure workers to explore the appropriate sample size. By observing 12 learning curves, we conclude that a sample size of 250 can balance model performance with the cost of data collection. Our study can provide a reference when deciding on the sample size to collect in advance.-
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationElectronic Research Archive, 2022, v. 30, no. 9, p. 3452-3462en_US
dcterms.isPartOfElectronic research archiveen_US
dcterms.issued2022-
dc.identifier.scopus2-s2.0-85135388816-
dc.identifier.eissn2688-1594en_US
dc.description.validate202309 bcvc-
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberOA_Scopus/WOS-
dc.description.fundingSourceOthersen_US
dc.description.fundingTextHong Kong Polytechnic Universityen_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Journal/Magazine Article
Files in This Item:
File Description SizeFormat 
era-30-09-176.pdf24.2 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

95
Citations as of May 11, 2025

Downloads

111
Citations as of May 11, 2025

SCOPUSTM   
Citations

7
Citations as of Jun 6, 2025

WEB OF SCIENCETM
Citations

7
Citations as of Jun 5, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.