Please use this identifier to cite or link to this item: http://hdl.handle.net/10397/108800
PIRA download icon_1.1View/Download Full Text
Title: Multi-modal representation via contrastive learning with attention bottleneck fusion and attentive statistics features
Authors: Guo, Q
Liao, Y
Li, Z 
Liang, S
Issue Date: Oct-2023
Source: Entropy, Oct. 2023, v. 25, no. 10, 1421
Abstract: The integration of information from multiple modalities is a highly active area of research. Previous techniques have predominantly focused on fusing shallow features or high-level representations generated by deep unimodal networks, which only capture a subset of the hierarchical relationships across modalities. However, previous methods are often limited to exploiting the fine-grained statistical features inherent in multimodal data. This paper proposes an approach that densely integrates representations by computing image features’ means and standard deviations. The global statistics of features afford a holistic perspective, capturing the overarching distribution and trends inherent in the data, thereby facilitating enhanced comprehension and characterization of multimodal data. We also leverage a Transformer-based fusion encoder to effectively capture global variations in multimodal features. To further enhance the learning process, we incorporate a contrastive loss function that encourages the discovery of shared information across different modalities. To validate the effectiveness of our approach, we conduct experiments on three widely used multimodal sentiment analysis datasets. The results demonstrate the efficacy of our proposed method, achieving significant performance improvements compared to existing approaches.
Keywords: Attention bottleneck fusion
Attentive statistics features
Contrastive learning
Multimodal representation
Publisher: MDPI AG
Journal: Entropy 
EISSN: 1099-4300
DOI: 10.3390/e25101421
Rights: © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
The following publication Guo Q, Liao Y, Li Z, Liang S. Multi-Modal Representation via Contrastive Learning with Attention Bottleneck Fusion and Attentive Statistics Features. Entropy. 2023; 25(10):1421 is available at https://doi.org/10.3390/e25101421.
Appears in Collections:Journal/Magazine Article

Files in This Item:
File Description SizeFormat 
entropy-25-01421.pdf2.24 MBAdobe PDFView/Open
Open Access Information
Status open access
File Version Version of Record
Access
View full-text via PolyU eLinks SFX Query
Show full item record

Page views

67
Citations as of Nov 10, 2025

Downloads

22
Citations as of Nov 10, 2025

SCOPUSTM   
Citations

6
Citations as of Dec 19, 2025

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.