Speech Emotional Recognition of Telephone Conversation by Using Deep Learning

Authors

  • Izza Nur Afifah Electrical Department, Politeknik Elektronika Negeri Surabaya
  • Titon Dutono Politeknik Elektronika Negeri Surabaya
  • Tri Budi Santoso Politeknik Elektronika Negeri Surabaya

DOI:

https://doi.org/10.33022/ijcs.v13i6.4442

Abstract

In this research, we have compare two clustering algorithms, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to support the SER model for voice communication. We proposed an approach to speech emotion recognition in telephone conversations using a combination of Mel-frequency cepstral coefficients (MFCC) for audio feature extraction. Only the zeroth coefficient (energy) of MFCC will be used, as energy can provide a good representation for sound. The extracted results are then classified using CNN and RNN. The CNN algorithm consistently achieved a higher accuracy than RNN with the values of 0.93 at epoch=50, 0.93 at epoch=100, 0.90 at epoch=150, and 0.93 at epoch=200. But the RNN algorithm has a faster training times than CNN. For optimizer test, Adam optimizer performs well for both models with respective accuracy values of 0.93 and 0.94, outperforming other optimizers for accuracy.

Downloads

Published

03-12-2024