Deep learning image captioning in construction management : a feasibility study

Xiao, B; Wang, Y; Kang, SC

doi:10.1061/(ASCE)CO.1943-7862.0002297

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/97760

DC Field	Value	Language
dc.contributor	Department of Building and Real Estate	en_US
dc.creator	Xiao, B	en_US
dc.creator	Wang, Y	en_US
dc.creator	Kang, SC	en_US
dc.date.accessioned	2023-03-13T02:23:53Z	-
dc.date.available	2023-03-13T02:23:53Z	-
dc.identifier.issn	0733-9364	en_US
dc.identifier.uri	http://hdl.handle.net/10397/97760	-
dc.language.iso	en	en_US
dc.publisher	American Society of Civil Engineers	en_US
dc.rights	© 2022 American Society of Civil Engineers.	en_US
dc.rights	This material may be downloaded for personal use only. Any other use requires prior permission of the American Society of Civil Engineers. This material may be found at https://dx.doi.org/10.1061/(ASCE)CO.1943-7862.0002297.	en_US
dc.subject	Deep learning	en_US
dc.subject	Image captioning	en_US
dc.subject	Construction machines	en_US
dc.subject	Feasibility study	en_US
dc.subject	Vision-based monitoring	en_US
dc.title	Deep learning image captioning in construction management : a feasibility study	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.volume	148	en_US
dc.identifier.issue	7	en_US
dc.identifier.doi	10.1061/(ASCE)CO.1943-7862.0002297	en_US
dcterms.abstract	Deep learning image captioning methods are able to generate one or several natural sentences to describe the contents of construction images. By deconstructing these sentences, the construction object and activity information can be retrieved integrally for automated scene analysis. However, the feasibility of deep learning image captioning in construction remains unclear. To fill this gap, this research investigates the feasibility of deep learning image captioning methods in construction management. First, a linguistic schema for annotating construction machine images was established, and a captioning data set was developed. Then, six deep learning image captioning methods from the computer vision community were selected and tested on the construction captioning data set. In the sentence-level evaluation, the transformer-self-critical sequence training (Tsfm-SCST) method has obtained the best performance among six methods with the bilingual evaluation (BLEU)-1 score of 0.606, BLEU-2 of 0.506, BLEU-3 of 0.427, BLEU-4 of 0.349, metric for evaluation of translation with explicit ordering (METEOR) of 0.287, recall-oriented understudy for gisting evaluation (ROUGE) of 0.585, consensus-based image description evaluation (CIDEr) of 1.715, and semantic propositional image caption evaluation (SPICE) score of 0.422. In the element-level evaluation, the Tsfm-SCST method achieved an average precision of 91.1%, recall of 83.3%, and an F1 score of 86.6% for recognition of construction machine objects by deconstructing the generated sentences. This research indicates that deep learning image captioning is feasible as a method of generating accurate and precise text descriptions from construction images, with potential applications in construction scene analysis and image documentation.	en_US
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	Journal of construction engineering and management, July 2022, v. 148, no. 7, 04022049	en_US
dcterms.isPartOf	Journal of construction engineering and management	en_US
dcterms.issued	2022-07	-
dc.identifier.eissn	1943-7862	en_US
dc.identifier.artn	04022049	en_US
dc.description.validate	202303 bcch	en_US
dc.description.oa	Accepted Manuscript	en_US
dc.identifier.FolderNumber	a1177-n01	-
dc.identifier.SubFormID	44077	-
dc.description.fundingSource	Self-funded	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	Green (AAM)	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Xiao_Image_Captioning_Construction.pdf	Pre-Published version	5.98 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Final Accepted Manuscript

Access

View full-text via PolyU eLinks

Show simple item record

Page views

110

Citations as of Apr 14, 2025

Downloads

238

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

11

Citations as of Jun 21, 2024

WEB OF SCIENCE^TM
Citations

10

Citations as of Oct 10, 2024

Google Scholar^TM

Check

Files in This Item:

Open Access Information

Access

Page views

Downloads

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM