Document Type
Article
Publication Title
Computer Systems Science and Engineering
Abstract
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines. Speech Emotion Recognition (SER) is one of the critical sources for human evaluation, which is applicable in many real-world applications such as healthcare, call centers, robotics, safety, and virtual reality. This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker's emotional state. The authors designed a Temporal Convolutional Network (TCN) core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network to fuse the spatial features and recognize global information for final classification. The proposed network extracts valid sequential cues automatically from speech signals, which performed better than state-of-the-art (SOTA) and traditional machine learning algorithms. Results of the proposed method show a high recognition rate compared with SOTAmethods. The final unweighted accuracy of 80.84%, and 92.31%, for interactive emotional dyadic motion captures (IEMOCAP) and berlin emotional dataset (EMO-DB), indicate the robustness and efficiency of the designed model.
First Page
3355
Last Page
3369
DOI
10.32604/csse.2023.037373
Publication Date
4-3-2023
Keywords
Affective computing, deep learning, emotion recognition, speech signal, temporal convolutional network
Recommended Citation
M. Ishaq, M. Khan, and S. Kwon "TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network," Comput. Syst. Sci. Eng., vol. 46, no. 3, pp. 3355-3369. 2023. https://doi.org/10.32604/csse.2023.037373
Additional Links
https://doi.org/10.32604/csse.2023.037373
Comments
Open Access, archived thanks to Computer Systems Science and Engineering
License: CC by 4.0
Uploaded: June 19, 2024