Acoustic and conversational speech analysis of depressed adolescents and their parents

Stolar, M 2016, Acoustic and conversational speech analysis of depressed adolescents and their parents, Doctor of Philosophy (PhD), Engineering, RMIT University.

Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Stolar.pdf Thesis Click to show the corresponding preview/stream application/pdf;... 40.65MB
Title Acoustic and conversational speech analysis of depressed adolescents and their parents
Author(s) Stolar, M
Year 2016
Abstract Clinical depression is a debilitating disorder with increasing prevalence, economically a burden and linked to high suicide rates. Symptoms can show during adolescence and lead to long-term consequences in adulthood if left untreated. Current diagnosis issues are related to access, seeking treatment and subjective techniques.

An automated, discrete and efficient system, capable of detecting depression and analyzing risk factors is required. This could assist in regulating family interactions with therapies to alter patterns correlated to depression.

An audio-visual database collected by Oregon research institute (ORI-DB) provided a corpus of 68 adolescents (29 depressed and 34 non-depressed) in dyadic parent-adolescent conversations. ORI-DB has corresponding living in family environments annotations (LIFE) facilitating audio and conversation approaches.

Speech analysis in past depression studies are mostly restricted to small databases of adult speech; dependent on audio quality, gender, speaker and environment. The acoustic approach investigated a range of spectral, prosodic, cepstral, TEO and glottal features and new glottal waveform features with consistent results with past studies:

Gender dependence increased adolescent depression detection rates
Spectral features outperform prosodic features
TEO are robust to noisy signals and MFCC performance degrades with noise
Glottal waveform (G-MFCC/G-TEO) improves on speech features (TEO/MFCC)
Combinations of subcategories and categories enhance depression detection

Important new contributions provided the following observations:

A new roll-off range was introduced and improved with 2-stage mRMR/SVM
Also, to our knowledge, the first use of G-MFCC in depression detection
For the first time proposed adolescent depression detection from parent’s features
Additionally, this thesis explored depression in relation to emotional characteristics in conversation. Clinical depression is an affect (emotion) regulation disorder [24], associated with emotional disturbances [449] and has a strong correlation with expressed emotions, mood changes, and stress [21][100][16]. Furthermore, adolescent depression is strongly correlated to quality of family interaction in terms of emotions [24][134][135][136][320][144][157]. Considering the link between emotion and depression it would be expected that analysis of emotional characteristics in conversations could be applied towards detect clinical depression.

It was proposed to generate a conversation modeling system (CMS) as a complex model to capture emotional dependence patterns related to inter- and intra-influences and emotional transitions in dyadic emotional annotated sequences. The CMS is based on an Influence Model (IM) and extended to an Emotional Influence Model (EIM), Dynamic (DEIM) and Higher Order (HOEIM) with Mixed Memory Markov (HOEIM-MMM) and Kneser-Ney Smoothed ngrams (HOEIM-KN-ngram).

This thesis makes contributions to the analysis and understanding of emotional conversations in the context of depression detection and potential therapies. A first study of its kind the EIM features were analyzed, classified and can provide new insights into depression with the following major observations:

HOEIM improved data fit, in terms of log-likelihood, as the order increased
DEIM/HOEIM parameters generated psychologically valid interoperations of emotional influence with statistical significance between depressed and control
Depression classification better using higher delay/order and found HOEIM had higher depression detection rates (DEIM<HOEIM-MMM<HOEIM-KN-ngram)

The major conclusions of the best overall depression detection performance in each acoustic and conversation modeling approaches as follows:

Acoustic features: P+S (GDM-M) 98.8% and G+S (GDM-F) 97.1%
Conversation Modeling: HOEIM-KN-ngram mRMR/SVM (GIM) 95.5%
Degree Doctor of Philosophy (PhD)
Institution RMIT University
School, Department or Centre Engineering
Subjects Signal Processing
Keyword(s) Acoustic Analysis
Depression Detection
Emotion Recognition
Deep Neural Network
Speech Processing
Influence Model
Version Filter Type
Access Statistics: 480 Abstract Views, 934 File Downloads  -  Detailed Statistics
Created: Mon, 23 Jan 2017, 08:09:51 EST by Denise Paciocco
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us