Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/101451
PIRA download icon_1.1View/Download Full Text
DC FieldValueLanguage
dc.contributorDepartment of Computingen_US
dc.creatorXu, Cen_US
dc.creatorTan, Hen_US
dc.creatorLi, Jen_US
dc.creatorLi, Pen_US
dc.date.accessioned2023-09-18T02:26:36Z-
dc.date.available2023-09-18T02:26:36Z-
dc.identifier.urihttp://hdl.handle.net/10397/101451-
dc.descriptionThe 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, December 7–11, 2022en_US
dc.language.isoenen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.rights© 2022 Association for Computational Linguistics.en_US
dc.rightsMaterials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License. (https://creativecommons.org/licenses/by/4.0/)en_US
dc.rightsThe following publication Chunpu Xu, Hanzhuo Tan, Jing Li, and Piji Li. 2022. Understanding Social Media Cross-Modality Discourse in Linguistic Space. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2459–2471, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics is available at https://doi.org/10.18653/v1/2022.findings-emnlp.182.en_US
dc.titleUnderstanding social media cross-modality discourse in linguistic spaceen_US
dc.typeConference Paperen_US
dc.identifier.spage2459en_US
dc.identifier.epage2471en_US
dc.identifier.doi10.18653/v1/2022.findings-emnlp.182en_US
dcterms.abstractThe multimedia communications with texts and images are popular on social media. However, limited studies concern how images are structured with texts to form coherent meanings in human cognition. To fill in the gap, we present a novel concept of cross-modality discourse, reflecting how human readers couple image and text understandings. Text descriptions are first derived from images (named as subtitles) in the multimedia contexts. Five labels – entity-level insertion, projection and concretization and scene-level restatement and extension — are further employed to shape the structure of subtitles and texts and present their joint meanings. As a pilot study, we also build the very first dataset containing over 16K multimedia tweets with manually annotated discourse labels. The experimental results show that trendy multimedia encoders based on multi-head attention (with captions) are unable to well understand cross-modality discourse and additionally modeling texts at the output layer helps yield the-state-of-the-art results.en_US
dcterms.accessRightsopen accessen_US
dcterms.bibliographicCitationFindings of the Association for Computational Linguistics: EMNLP 2022, p. 2459-2471en_US
dcterms.issued2022-
dc.identifier.ros2022003129-
dc.relation.conferenceConference on Empirical Methods in Natural Language Processing [EMNLP]en_US
dc.description.validate202309 bcwwen_US
dc.description.oaVersion of Recorden_US
dc.identifier.FolderNumberCDCF_2022-2023-
dc.description.fundingSourceRGCen_US
dc.description.fundingSourceOthersen_US
dc.description.fundingTextNSFC Young Scientists Fund (No.62006203, 62106105); PolyU internal funds (1-BE2W, 4-ZZKM, and 1-ZVRH); CCF-Baidu Open Fund (No. 2021PP15002000)en_US
dc.description.pubStatusPublisheden_US
dc.description.oaCategoryCCen_US
Appears in Collections:Conference Paper
Files in This Item:
File Description SizeFormat 
Xu_Understanding_Social_Media.pdf2.95 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show simple item record

Page views

134
Last Week
8
Last month
Citations as of Nov 10, 2025

Downloads

60
Citations as of Nov 10, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.