Dissertation/Thesis Abstract

Robust Classification of Emotion in Human Speech Using Spectrogram Features
by Guven, Erhan, Ph.D., The George Washington University, 2012, 133; 3502353
Abstract (Summary)

The recognition of emotions, such as anger, anxiety, joy, etc . from tonal variations in human speech is an important task for research and applications in human computer interaction. The objective of this research is to design, implement and test a Speech Emotion Classification (SEC) engine that can extract useful features and accurately classify emotions in human speech in the presence of speaker-dependent characteristics variations and noise. Current approaches extract several standard global values from the temporal sequence of power spectra, such as pitch, formants, energy, and values from the time signal, such as attack and decay rates. In this work, the frequency dimension of the spectrogram is quantized to simulate the Bark scale in the human audition system, the time dimension of the spectrogram is quantized in units starting from 50 ms, and the linear regression coefficients of the surface of each spectrogram segment are combined into a feature vector. In this way, complete local features are extracted to establish a larger sample. The accumulated feature vectors for each category of emotion provide a robust training basis for a state of the art classifier, such as an SVM. In order to further improve the performance of the SEC engine and to demonstrate the flexibility and benefit of local features, a backward context scheme is introduced. A series of experiments have been designed and conducted using the EMO-DB and LDC-DB speech emotion databases to measure the performance of the SEC engine. First, the accuracy and the precision of the performance are measured in terms of seven to fifteen emotion categories when trained on the speech utterances by random sampling. Next, the generalization performance is measured through a speaker cross-validation scheme. Third, the generalization and robust performance of the SEC engine is measured by performing gender, language and speaker classification with the SEC engine, hence measuring the discrimination power of the engine related to the speaker characteristics variations. Finally, the robust performance of the SEC engine is measured when the SNR is varied between 10 and 50 dB.

Indexing (document details)
Advisor: Bock, Peter
Commitee: Choi, Hyeong-Ah, Martin, Dianne, Monteleoni, Claire, Oertel, Carsten K., Youssef, Abdou
School: The George Washington University
Department: Computer Science
School Location: United States -- District of Columbia
Source: DAI-B 73/07(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Feature extraction, Machine learning, Speech recognition, Statistical learning, Support vector machines
Publication Number: 3502353
ISBN: 978-1-267-25270-8
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy