Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition

Stolar, M, Lech, M and Burnett, I 2014, 'Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition', in Proceedings of the 8th International Conference on Signal Processing and Communication Systems (ICSPCS 2014), Gold Coast, Australia, 15-17 December 2014, pp. 55-60.


Document type: Conference Paper
Collection: Conference Papers

Title Optimized multi-channel deep neural network with 2D graphical representation of acoustic speech features for emotion recognition
Author(s) Stolar, M
Lech, M
Burnett, I
Year 2014
Conference name ICSPCS 2014
Conference location Gold Coast, Australia
Conference dates 15-17 December 2014
Proceedings title Proceedings of the 8th International Conference on Signal Processing and Communication Systems (ICSPCS 2014)
Publisher IEEE
Place of publication United States
Start page 55
End page 60
Total pages 6
Abstract This study investigates the effectiveness of speech emotion recognition using a new approach called the Optimized Multi-Channel Deep Neural Network (OMC-DNN), The proposed method has been tested with input features given as simple 2D black and white images representing graphs of the MFCC coefficients or the TEO parameters calculated either from speech (MFCC-S, TEO-S) or glottal waveforms (MFCC-G, TEO-G). A comparison with 6 different single-channel benchmark classifiers has shown that the OMC-DNN provided the best performance in both pair-wise (emotion vs. neutral) and simultaneous multiclass recognition of 7 emotions (anger, boredom, disgust, happiness, fear, sadness and neutral). In the pair-wise case, the OMC-DNN outperformed the single-channel DNN by 5%-10% depending on the feature set. In the multiclass case, the OMC-DNN outperformed or matched the singlechannel equivalents for all features. The speech spectrum and the glottal energy characteristics were identified as two important factors in discriminating between different types of categorical emotions in speech.
Subjects Signal Processing
Keyword(s) Speech
Speech recognition
Emotion recognition
Accuracy
Benchmark testing
Artificial neural networks
DOI - identifier 10.1109/ICSPCS.2014.7021120
Copyright notice © 2014 IEEE
ISBN 9781479952564
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in Thomson Reuters Web of Science Article
Scopus Citation Count Cited 0 times in Scopus Article
Altmetric details:
Access Statistics: 6 Abstract Views  -  Detailed Statistics
Created: Thu, 31 Jan 2019, 11:26:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us