Computer Vision Faculty Publications

Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild

Fatimah Alzamzami, University of Ottawa
Abdulmotaleb El Saddik, University of Ottawa & Mohamed bin Zayed University of Artificial IntelligenceFollow

Document Type

Article

Publication Title

IEEE Access

Abstract

We present an image-based real-time sentiment analysis system that can be used to recognize in-the-wild sentiment expressions on online social networks. The system deploys the newly proposed transformer architecture on online social networks (OSN) big data to extract emotion and sentiment features using three types of images: images containing faces, images containing text, and images containing no faces/text. We build three separate models, one for each type of image, and then fuse all the models to learn the online sentiment behavior. Our proposed methodology combines a supervised two-stage training approach and threshold-moving method, which is crucial for the data imbalance found in OSN data. The training is carried out on existing popular datasets (i.e., for the three models) and our newly proposed dataset, the Domain Free Multimedia Sentiment Dataset (DFMSD). Our results show that inducing the threshold-moving method during the training has enhanced the sentiment learning performance by 5-8% more points compared to when the training was conducted without the threshold-moving approach. Combining the two-stage strategy with the threshold-moving method during the training process, has been proven effective to further improve the learning performance (i.e. by 12% more enhanced accuracy compared to the threshold-moving strategy alone). Furthermore, the proposed approach has shown a positive learning impact on the fusion of the three models in terms of the accuracy and F-score.

First Page

47070

Last Page

47079

DOI

10.1109/ACCESS.2023.3274744

Publication Date

5-10-2023

Keywords

big data, deep learning, feature extraction, fusion, images, multimodality, online social media, sentiment, threshold moving, transfer learning, Transformers, tweets, ViT

Comments

Open access, archived thanks to IEEE Access

License: CC BY NC-ND 4.0

Uploaded: June 19, 2024

Recommended Citation

F. Alzamzami and A. E. Saddik, "Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild," in IEEE Access, vol. 11, pp. 47070-47079, 2023, doi: 10.1109/ACCESS.2023.3274744.

Additional Links

https://doi.org/10.1109/ACCESS.2023.3274744

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Computer Vision Faculty Publications

Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Browse

Contribute

Links

Computer Vision Faculty Publications

Transformer-Based Feature Fusion Approach for Multimodal Visual Sentiment Recognition Using Tweets in the Wild

Authors

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Included in

Share

Browse

Contribute

Links