Enhanced speech emotion detection using deep neural networks

Publication Type:

Journal Article


International Journal of Speech Technology, Springer, p.1-14 (2018)




This paper focusses on investigation of the effective performance of perceptual based speech features on emotion detection. Mel frequency cepstral coefficients (MFCC’s), perceptual linear predictive cepstrum (PLPC), Mel frequency perceptual linear prediction cepstrum (MFPLPC), bark frequency cepstral coefficients (BFCC), revised perceptual linear prediction coefficient’s (RPLP) and inverted Mel frequency cepstral coefficients (IMFCC) are the perception features considered. The algorithm using these auditory cues is evaluated with deep neural networks (DNN). The novelty of the work involves analysis of the perceptual features to identify predominant features that contain significant emotional information about the speaker. The validity of the algorithm is analysed on publicly available Berlin database with seven emotions in 1-dimensional space termed categorical and 2-dimensional continuous space consisting of emotions in valence and arousal dimensions. Comparative analysis reveals that considerable improvement in the performance of emotion recognition is obtained using DNN with the identified combination of perceptual features.