Context-adaptive-based image captioning by Bi-CARU

Im, SK; Chan, KH

doi:10.1109/ACCESS.2023.3302512

Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/109615

DC Field	Value	Language
dc.contributor	Faculty of Science	-
dc.creator	Im, SK	-
dc.creator	Chan, KH	-
dc.date.accessioned	2024-11-08T06:10:28Z	-
dc.date.available	2024-11-08T06:10:28Z	-
dc.identifier.uri	http://hdl.handle.net/10397/109615	-
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers	en_US
dc.rights	This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/	en_US
dc.rights	The following publication S. -K. Im and K. -H. Chan, "Context-Adaptive-Based Image Captioning by Bi-CARU," in IEEE Access, vol. 11, pp. 84934-84943, 2023 is available at https://doi.org/10.1109/ACCESS.2023.3302512.	en_US
dc.subject	Attention mechanism	en_US
dc.subject	Bi-CARU	en_US
dc.subject	CNN	en_US
dc.subject	Context-adaptive	en_US
dc.subject	Image captioning	en_US
dc.subject	NLP	en_US
dc.subject	RNN	en_US
dc.title	Context-adaptive-based image captioning by Bi-CARU	en_US
dc.type	Journal/Magazine Article	en_US
dc.identifier.spage	84934	-
dc.identifier.epage	84943	-
dc.identifier.volume	11	-
dc.identifier.doi	10.1109/ACCESS.2023.3302512	-
dcterms.abstract	Image captions are abstract expressions of content representations using text sentences, helping readers to better understand and analyse information between different media. With the advantage of encoder-decoder neural networks, captions can provide a rational structure for tasks such as image coding and caption prediction. This work introduces a Convolutional Neural Network (CNN) to Bidirectional Content-Adaptive Recurrent Unit (Bi-CARU) (CNN-to-Bi-CARU) model that performs bidirectional structure to consider contextual features and captures major feature from image. The encoded feature coded form image is respectively passed into the forward and backward layer of CARU to refine the word prediction, providing contextual text output for captioning. An attention layer is also introduced to collect the feature produced by the context-adaptive gate in CARU, aiming to compute the weighting information for relationship extraction and determination. In experiments, the proposed CNN-to-Bi-CARU model outperforms other advanced models in the field, achieving better extraction of contextual information and detailed representation of image captions. The model obtains a score of 41.28 on BLEU@4, 31.23 on METEOR, 61.07 on ROUGE-L, and 133.20 on CIDEr-D, making it competitive in the image captioning of MSCOCO dataset.	-
dcterms.accessRights	open access	en_US
dcterms.bibliographicCitation	IEEE access, 2023, v. 11, p. 84934-84943	-
dcterms.isPartOf	IEEE access	-
dcterms.issued	2023	-
dc.identifier.scopus	2-s2.0-85167787587	-
dc.identifier.eissn	2169-3536	-
dc.description.validate	202411 bcch	-
dc.description.oa	Version of Record	en_US
dc.identifier.FolderNumber	OA_Scopus/WOS	en_US
dc.description.fundingSource	Others	en_US
dc.description.fundingText	Macao Polytechnic University Research Project	en_US
dc.description.pubStatus	Published	en_US
dc.description.oaCategory	CC	en_US
Appears in Collections:	Journal/Magazine Article

Files in This Item:

File	Description	Size	Format
Im_Context-Adaptive-Based_Image_Captioning.pdf		1.74 MB	Adobe PDF	View/Open

Open Access Information

Status	open access
File Version	Version of Record

Access

View full-text via PolyU eLinks

Show simple item record

Page views

27

Citations as of Apr 14, 2025

Downloads

23

Citations as of Apr 14, 2025

SCOPUS^TM
Citations

10

Citations as of Apr 3, 2026

Google Scholar^TM

Check