Hierarchical denoising representation disentanglement and dual-channel cross-modal-context interaction for multimodal sentiment analysis
Li, Zuhe; Huang, Zhenwei; Pan, Yushan; Yu, Jun; Liu, Weihua; Chen, Haoran; Luo, Yiming; Wu, Di; Wang, Hao
Peer reviewed, Journal article
Published version
View/ Open
Date
2024Metadata
Show full item recordCollections
- Institutt for IKT og realfag [614]
- Publikasjoner fra CRIStin - NTNU [39118]
Abstract
Multimodal sentiment analysis aims to extract sentiment cues from various modalities, such as textual, acoustic, and visual data, and manipulate them to determine the inherent sentiment polarity in the data. Despite significant achievements in multimodal sentiment analysis, challenges persist in addressing noise features in modal representations, eliminating substantial gaps in sentiment information among modal representations, and exploring contextual information that expresses different sentiments between modalities. To tackle these challenges, our paper proposes a new Multimodal Sentiment Analysis (MSA) framework. Firstly, we introduce the Hierarchical Denoising Representation Disentanglement module (HDRD), which employs hierarchical disentanglement techniques. This ensures the extraction of both common and private sentiment information while eliminating interference noise from modal representations. Furthermore, to address the uneven distribution of sentiment information among modalities, our Inter-Modal Representation Enhancement module (IMRE) enhances non-textual representations by extracting sentiment information related to non-textual representations from textual representations. Next, we introduce a new interaction mechanism, the Dual-Channel Cross-Modal Context Interaction module (DCCMCI). This module not only mines correlated contextual sentiment information within modalities but also explores positive and negative correlation contextual sentiment information between modalities. We conducted extensive experiments on two benchmark datasets, MOSI and MOSEI, and the results indicate that our proposed method offers state-of-the-art approaches.