Speech Emotional Recognition of Telephone Conversation by Using Deep Learning

Izza Nur Afifah; Titon Dutono; Tri Budi Santoso

doi:10.33022/ijcs.v13i6.4442

Authors

Izza Nur Afifah Electrical Department, Politeknik Elektronika Negeri Surabaya
Titon Dutono Politeknik Elektronika Negeri Surabaya
Tri Budi Santoso Politeknik Elektronika Negeri Surabaya

DOI:

https://doi.org/10.33022/ijcs.v13i6.4442

Abstract

In this research, we have compare two clustering algorithms, Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) to support the SER model for voice communication. We proposed an approach to speech emotion recognition in telephone conversations using a combination of Mel-frequency cepstral coefficients (MFCC) for audio feature extraction. Only the zeroth coefficient (energy) of MFCC will be used, as energy can provide a good representation for sound. The extracted results are then classified using CNN and RNN. The CNN algorithm consistently achieved a higher accuracy than RNN with the values of 0.93 at epoch=50, 0.93 at epoch=100, 0.90 at epoch=150, and 0.93 at epoch=200. But the RNN algorithm has a faster training times than CNN. For optimizer test, Adam optimizer performs well for both models with respective accuracy values of 0.93 and 0.94, outperforming other optimizers for accuracy.

Speech Emotional Recognition of Telephone Conversation by Using Deep Learning

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Index

Language

Make a Submission

Template

Visitor Statistics