Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/110747
DC FieldValueLanguage
dc.contributorDepartment of Civil and Environmental Engineering-
dc.contributorResearch Centre for Resources Engineering towards Carbon Neutrality-
dc.creatorYao, L-
dc.creatorLeng, Z-
dc.creatorNi, F-
dc.date.accessioned2025-01-21T06:23:06Z-
dc.date.available2025-01-21T06:23:06Z-
dc.identifier.issn1029-8436-
dc.identifier.urihttp://hdl.handle.net/10397/110747-
dc.language.isoenen_US
dc.publisherTaylor & Francisen_US
dc.subjectAnomalous data handling strategiesen_US
dc.subjectData imputationen_US
dc.subjectMatrix completionen_US
dc.subjectPavement conditionen_US
dc.subjectPavement performance predictionen_US
dc.titleA matrix completion approach for imputing missing pavement condition data and its impact on pavement performance predictionen_US
dc.typeJournal/Magazine Articleen_US
dc.identifier.volume25-
dc.identifier.issue1-
dc.identifier.doi10.1080/10298436.2024.2437055-
dcterms.abstractHigh-quality pavement condition data are crucial for effective pavement management. However, issues like missing or erroneous data are common, and existing studies often inadequately document their data cleaning processes. This study introduces a pavement condition data imputation method using the softimpute matrix completion algorithm, comparing it to linear interpolation across varying missing data ratios. The impact of anomalous data handling strategies on pavement performance prediction across various data availability levels was investigated. Results show that as the missing data ratio increases, both imputation methods experience reduced accuracy, though the error for linear interpolation rises more sharply than softimpute. Softimpute consistently outperforms linear interpolation in imputation accuracy across all missing data ratios but may introduce subtle distributional biases at missing rates above 50%. For datasets smaller than 100, softimpute is recommended while direct deletion is less advantageous than using original data. For larger datasets (>150,000 for neural networks and >10,000 for tree-based models), direct deletion yields optimal prediction performance, making imputation unnecessary. For medium-sized datasets, imputation is preferred, though the performance gap between softimpute and direct deletion narrows as data volume grows. This study is expected to guide practitioners in selecting effective anomalous data handling strategies for improved pavement management.-
dcterms.accessRightsembargoed accessen_US
dcterms.bibliographicCitationInternational journal of pavement engineering, 2024, v. 25, no. 1, 2437055-
dcterms.isPartOfInternational journal of pavement engineering-
dcterms.issued2024-
dc.identifier.scopus2-s2.0-85212761597-
dc.identifier.eissn1477-268X-
dc.identifier.artn2437055-
dc.description.validate202501 bcrc-
dc.identifier.FolderNumbera3363en_US
dc.identifier.SubFormID49999en_US
dc.description.fundingSourceSelf-fundeden_US
dc.description.pubStatusPublisheden_US
dc.date.embargo2026-12-18en_US
dc.description.TAGreen (AAM)en_US
Appears in Collections:Journal/Magazine Article
Open Access Information
Status embargoed access
Embargo End Date 2026-12-18
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.