Computer Vision Faculty Publications

Multilevel Feature Representation for Hybrid Transformers-based Emotion Recognition

Monorama Swain, Silicon Institute of Technology, Bhubaneswar
Bubai Maji, Silicon Institute of Technology, Bhubaneswar
Mustaqeem Khan, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Abdulmotaleb El Saddik, Mohamed Bin Zayed University of Artificial IntelligenceFollow
Wail Gueaieb, Mohamed Bin Zayed University of Artificial IntelligenceFollow

Document Type

Conference Proceeding

Publication Title

BioSMART 2023 - Proceedings: 5th International Conference on Bio-Engineering for Smart Technologies

Abstract

Automated Speech Emotion Recognition (SER) systems and human-computer interaction systems are both heavily reliant on emotion. Global and temporal representation of utterances is crucial to the effectiveness of an SER module. Research conducted by the author demonstrates that the temporal data gathered by the transformer can significantly improve the SER system's overall recognition rate. There are some limitations to all of the existing hybrid models, despite the fact that the performance of hybrid models is higher than that of conventional classifiers. Despite this, the relationship between different speech cues and the learning of high-level global and temporal cues using a transformer has not been studied thoroughly. As a result, this research discovered an efficient transformer-based hybrid technique for emotion recognition via multilevel feature representation of speech signals. To learn deeper information from global and temporal representations, the proposed method comprises a parallel convolutional encoder, a spatial encoder, and a sequential encoder. Furthermore, the learned cues pass through the proposed transformer to capture the salient information for a specific emotion in the input sequence. To verify its effectiveness, we evaluated the proposed approach and achieved state-of-the-art (SOTA) results 75.29% and 88.18% weighted, and 76.34% and 88.49% unweighted accuracy on the IEMOCAP and SITB-OSED corpora.

DOI

10.1109/BioSMART58455.2023.10162089

Publication Date

7-3-2023

Keywords

Emotion Recognition, Human-Computer Interaction, Hybrid Transformer, Multilevel Feature Representation, Speech Signal

Comments

IR conditions: non-described

Recommended Citation

M. Swain, B. Maji, M. Khan, A. E. Saddik and W. Gueaieb, "Multilevel Feature Representation for Hybrid Transformers-based Emotion Recognition," 2023 5th International Conference on Bio-engineering for Smart Technologies (BioSMART), Paris, France, 2023, pp. 1-5, doi: 10.1109/BioSMART58455.2023.10162089.

Additional Links

https://doi.org/10.1109/BioSMART58455.2023.10162089

Link to Full Text

COinS

Computer Vision Faculty Publications

Multilevel Feature Representation for Hybrid Transformers-based Emotion Recognition

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Browse

Contribute

Links

Computer Vision Faculty Publications

Multilevel Feature Representation for Hybrid Transformers-based Emotion Recognition

Authors

Document Type

Publication Title

Abstract

DOI

Publication Date

Keywords

Comments

Recommended Citation

Additional Links

Share

Browse

Contribute

Links