Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based mel frequency cepstral coefficients and fuzzy vector quantization

Hossan, M 2011, Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based mel frequency cepstral coefficients and fuzzy vector quantization, Masters by Research, Electrical and Computer Engineering, RMIT University.


Document type: Thesis
Collection: Theses

Attached Files
Name Description MIMEType Size
Hossan.pdf Thesis application/pdf 1.10MB
Title Automatic speaker recognition dynamic feature identification and classification using distributed discrete cosine transform based mel frequency cepstral coefficients and fuzzy vector quantization
Author(s) Hossan, M
Year 2011
Abstract The Mel-Frequency Cepstral Coefficients (MFCC) feature extraction method is a leading approach for speech feature extraction and current research aims to identify performance enhancements. In this thesis, a novel approach for MFCC feature extraction and classification is presented and used for speaker recognition. In this research, a new MFCC feature extraction method based on distributed Discrete Cosine Transform (DCT-II) is presented. The proposed feature extraction method applies the DCT-II technique to compute the dynamic features used during speaker recognition. The new algorithm incorporates the DCT-II based MFCC feature extraction method and a Fuzzy Vector Quantization (FVQ) data clustering classifier. The proposed automatic speaker recognition algorithm utilises a recently introduced variation of MFCC known as Delta-Delta MFCC (DDMFCC) to identify the dynamic features that are used for speaker recognition. A series of experiments were performed utilising three different feature extraction methods: (1) conventional MFCC; (2) DDMFCC; and (3) DCT-II based DDMFCC. The experiments were then expanded to include four data clustering classifiers including: (1) K-means Vector Quantization; (2) Linde Buzo Gray Vector Quantization; (3) FVQ; and (4) Gaussian Mixture Model. The National Institute of Standards and Technology (NIST) Speaker Recognition Evaluation (SRE 04) corpora was used to provide speaker source data for the experiments. The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate (EER) for the vector quantization based classifiers. The speaker verification tests highlighted the overall improvement in performance for the new ASR system.
Degree Masters by Research
Institution RMIT University
School, Department or Centre Electrical and Computer Engineering
Keyword(s) Automatic Speech Recognition
Automatic Speaker Verification
Discrete Cosine Transform
Delta- Delta Mel Frequency Cepstrum Coefficient
Fuzzy Vector Quantization
Versions
Version Filter Type
Access Statistics: 626 Abstract Views, 2160 File Downloads  -  Detailed Statistics
Created: Tue, 11 Oct 2011, 13:21:26 EST by Guy Aron
© 2014 RMIT Research Repository • Powered by Fez SoftwareContact us