Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images

Lech, M, Stolar, M, Bolia, R and Skinner, M 2018, 'Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images', Advances in Science, Technology and Engineering Systems Journal, vol. 3, no. 4, pp. 363-371.


Document type: Journal Article
Collection: Journal Articles

Title Amplitude-Frequency Analysis of Emotional Speech Using Transfer Learning and Classification of Spectrogram Images
Author(s) Lech, M
Stolar, M
Bolia, R
Skinner, M
Year 2018
Journal name Advances in Science, Technology and Engineering Systems Journal
Volume number 3
Issue number 4
Start page 363
End page 371
Total pages 9
Publisher Advances in Science, Technology and Engineering Systems
Abstract Automatic speech emotion recognition (SER) techniques based on acoustic analysis show high confusion between certain emotional categories. This study used an indirect approach to provide insights into the amplitude-frequency characteristics of different emotions in order to support the development of future, more efficiently differentiating SER methods. The analysis was carried out by transforming short 1-second blocks of speech into RGB or grey-scale images of spectrograms. The images were used to fine-tune a pre-trained image classification network to recognize emotions. Spectrogram representation on four different frequency scales - linear, melodic, equivalent rectangular bandwidth (ERB), and logarithmic - allowed observation of the effects of high, mid-high, mid-low and low frequency characteristics of speech, respectively. Whereas the use of either red (R), green (G) or blue (B) components of RGB images showed the importance of speech components with high, mid and low amplitude levels, respectively. Experiments conducted on the Berlin emotional speech (EMO-DB) data revealed the relative positions of seven emotional categories (anger, boredom, disgust, fear, joy, neutral and sadness) on the amplitude-frequency plane.
Subject Signal Processing
Keyword(s) Speech processing
Emotion recognition
Deep neural networks
DOI - identifier 10.25046/aj030437
Copyright notice Copyright © This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
ISSN 2415-6698
Versions
Version Filter Type
Altmetric details:
Access Statistics: 28 Abstract Views  -  Detailed Statistics
Created: Thu, 31 Jan 2019, 11:26:00 EST by Catalyst Administrator
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us