Dissertation/Thesis Abstract

Cross-Genre, Cross-Lingual, and Low-Resource Emotion Classification
by Tafreshi, Shabnam, Ph.D., The George Washington University, 2021, 186; 28088437
Abstract (Summary)

Emotions can be defined as a natural, instinctive state of mind arising from one’s circumstances, mood, and relationships with others. It has long been a question to be answered by psychology that how and what is it that humans feel. Enabling computers to recognize human emotions has been an of interest to researchers since 1990s (Picard et al., 1995). Ever since, this area of research has grown significantly and emotion detection is becoming an important component in many natural language processing tasks.

Several theories exist for defining emotions and are chosen by researchers according to their needs. For instance, according to appraisal theory, a psychology theory, emotions are produced by our evaluations (appraisals or estimates) of events that cause a specific reaction in different people. Some emotions are easy and universal, while others are complex and nuanced. Emotion classification is generally the process of labeling a piece of text with one or more corresponding emotion labels.

Psychologists have developed numerous models and taxonomies of emotions. The model or taxonomy depends on the problem, and thorough study is often required to select the best model. Early studies of emotion classification focused on building computational models to classify basic emotion categories. In recent years, increasing volumes of social media and the digitization of data have opened a new horizon in this area of study, where emotion classification is a key component of applications, including mood and behavioral studies, as well as disaster relief, amongst many other applications. Sophisticated models have been built to detect and classify emotion in text, but few analyze how well a model is able to learn emotion cues. The ability to learn emotion cues properly and be able to generalize this learning is very important. This work investigates the robustness of emotion classification approaches across genres and languages, with a focus on quantifying how well state-of-the-art models are able to learn emotion cues.

First, we use multi-task learning and hierarchical models to build emotion models that were trained on data combined from multiple genres. Our hypothesis is that a multi-genre, noisy training environment will help the classifier learn emotion cues that are prevalent across genres. Second, we explore splitting text (i.e. sentence) into its clauses and testing whether the model’s performance improves. Emotion analysis needs fine-grained annotation and clause-level annotation can be beneficial to design features to improve emotion detection performance. Intuitively, clause-level annotations may help the model focus on emotion cues, while ignoring irrelevant portions of the text. Third, we adopted a transfer learning approach for cross-lingual/genre emotion classification to focus the classifier’s attention on emotion cues which are consistent across languages. Fourth, we empirically show how to combine different genres to be able to build robust models that can be used as source models for emotion transfer to low-resource target languages. Finally, this study involved curating and re-annotating popular emotional data sets in different genres, and annotating a multi-genre corpus of Persian tweets and news, and generating a collection of emotional sentences for a low-resource language, Azerbaijani, a language spoken in the north west of Iran.

Indexing (document details)
Advisor: Diab, Mona
Commitee: Resnik, Philip, Youssef, Abdu, Pless, Robert, Caliskan, Aylin, McKeown, Kathleen
School: The George Washington University
Department: Computer Science
School Location: United States -- District of Columbia
Source: DAI-A 82/6(E), Dissertation Abstracts International
Subjects: Artificial intelligence, Genetics, Linguistics, Design, Language arts, Middle Eastern Studies, Educational psychology
Keywords: Cross-genre, Emotion classification, Low-resource languages, Text annotation, Natural language processing skills, Persia, Iran, Azerbaijani
Publication Number: 28088437
ISBN: 9798557002103
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy