Speech Emotional Recognition of Telephone Conversation by Using Deep Learning
DOI:
https://doi.org/10.33022/ijcs.v13i6.4442Abstract
In this research, we have compare two clustering algorithms, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to support the SER model for voice communication. We proposed an approach to speech emotion recognition in telephone conversations using a combination of Mel-frequency cepstral coefficients (MFCC) for audio feature extraction. Only the zeroth coefficient (energy) of MFCC will be used, as energy can provide a good representation for sound. The extracted results are then classified using CNN and RNN. The CNN algorithm consistently achieved a higher accuracy than RNN with the values of 0.93 at epoch=50, 0.93 at epoch=100, 0.90 at epoch=150, and 0.93 at epoch=200. But the RNN algorithm has a faster training times than CNN. For optimizer test, Adam optimizer performs well for both models with respective accuracy values of 0.93 and 0.94, outperforming other optimizers for accuracy.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Tri BUdi Santoso
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.