COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

A Synthetic FACS Framework for Expanding Facial Expression Lexicons
by Butler, Crystal, Ph.D., New York University Tandon School of Engineering, 2021, 125; 28157617
Abstract (Summary)

Facial expressions are a key modality of nonverbal communication. However, no comprehensive reference mapping the kinematics of facial expression (FE) production to visual changes in appearance and the semantics of those changes exists. Production and recognition of a small set of 6-7 basic expressions of emotion have been thoroughly explored in the psychology literature, namely:

• surprise

• fear

• happy

• sadness

• anger

• disgust

• contempt (optionally)

Some less widely accepted additions such as pain and compound basic emotions have also been described. Defining a broader range of more nuanced, communicative FEs is important to inform applications such as healthcare diagnostics and treatment, automated facial expression recognition, animation and assistive systems that incorporate affective, intelligent virtual agents.

This research project builds and validates a framework that can be leveraged to create more extensive FE lexicons. It begins with modeling FEs on a custom 3D avatar, using the Facial Action Coding System (FACS) as the foundation for synthesizing facial behavior. FACS describes individual muscle movements in the face as action units (AUs), which can be variably activated and combined to generate all behaviors that the human face can exhibit. The FACS-valid avatar can be controlled computationally to produce a nearly infinite number of weighted AU combinations. A key benefit of using synthetic FACS rather than expressions posed by human actors is that movements can be precisely parameterized and reliably replicated across diverse physiognomies. The avatar and scripts for generatively modeling facial behavior are referred to as the MiFace component of the framework.

Even if AU weights are fixed and a ceiling of 12 simultaneously activated AUs is set, the number of possible combinations of 38 AUs (24 primary AUs plus their unilateral variants, where applicable)that the avatar can model exceeds four billion. Therefore, to test the lexicon construction framework, I generatively modeled two subsets of candidate FEs as the stimulus sets for proof-of concept observer judgement studies. In those studies, crowdsourced participants voted on whether each FE conveys interpretable nonverbal information, then assigned freely chosen labels to those that were deemed to be communicative. Employing crowd workers to map FEs to freely chosen natural language labels allows for more ecologically valid, fine-grained descriptions of perceived FE signal values.

To analyze the resulting label sets, the framework incorporates a novel human-in-the-loop artificial intelligence pipeline to perform natural language processing (NLP). The pipeline comprises two key components: word embeddings fine-tuned to perform optimally on FE word semantic similarity scoring, and a hierarchical agglomerative clustering step that tests for adequate set coherence. The word embeddings provide real-valued scores for semantic similarity between all labels in a set; these scores act as inputs to the clustering step, and for coherent label sets, provide the basis for calculating the representative label (the set medoid). In order to evaluate word embedding performance, a set of FE word pairs with gold standard synonymy scores was developed. Scores were calculated by aggregating values chosen by a group of human raters, following a procedure similar to that used in making the SimLex-999 set of scored synonyms [51]. The synonymy-scored FE word pair benchmark is referred to as the FreeRes dataset. The NLP pipeline tuned for performance in scoring FE word pair synonymy is FreeRes-NLP.

By applying FreeRes-NLP to label sets gathered in the two FE labeling observer judgement studies mentioned previously, an example FE lexicon has been produced. It is made freely available, along with other key products of the project. Chapters two through seven of this publication describe project components that have been shared for use by the broader research community, specifically:

1. MiFace, an open source 3D model human capable of synthesizing FACS-validated facial muscle movements. The Maya avatar model created for testing the FE lexicon framework is made freely available for other researchers to use or modify. There are no other models available, paid or otherwise, that provide control of accurately represented individual AUs. TheMaya Embedded Language scripts used to generatively model candidate FEs are included in MiFace. Creation of the model and the method used to generate two test sets of candidate FEs are described in Chapter 2.

2. Synthetic FE images with full crowdsourced label sets for each. FE images that were considered communicative are available with complete lists of associated single word labels, as given by naîve observers. Label sets contain about 40 words each. The design and methods of the two observer judgement studies carried out to gather the label sets are described in Chapter 3.

3. FreeRes synonym set, a set of FE word pairs with gold standard synonymy scores assigned by human raters. 156 pairs of FE words were scored for synonymy by 10 crowdsourced raters each. The averaged results are provided for use as a gold standard measure ofNLP model performance on tasks that require a real value representation of semantic similarity between FE terms. Chapter 4 describes the design and methods for the FE word synonymy study.

4. An open word embeddings set fine-tuned for optimal performance on FE word pair semantic similarity scoring tasks. To understand the degree of agreement within an FE label set, a quantitative measure of synonymy between the constituents is required. Calculating the semantic similarity, or synonymy, between words is a specialized problem in NLP. Vector space models, a commonly used family of language representations that include word embeddings, are typically capable of distinguishing words based on a broader set of relatedness relationships; semantic similarity is only one of many possible dimensions. I tested a broad range of off-the-shelf and fine-tuned vector space models, including contextual embeddings distilled from a recently developed state-of-the-art transformer model. Based on testing with three scored synonym sets, word embeddings tuned on a custom-built dictionary and thesaurus corpus performed best. Details of model testing and training are given in Chapter5.

5. FreeRes-NLP, an open-source NLP pipeline for determining the representative FE label from a set. To establish highly descriptive, ecologically valid FE semantics, it is desirable to have a group of untrained observers freely provide natural language labels that describe their perceptions of FEs. However, no standard procedure for analyzing label sets from free-response studies currently exists. The pipeline developed for this project implements a human-in-the-loop process for programmatically assessing the coherence of a label set using hierarchical agglomerative clustering, then selects the set member that best represents the overall set semantics, i.e., the set medoid. Example output is demonstrated in the Figure 6.1dendrogram. The pipeline incorporates calculations based on the word embeddings described in Chapter 4. Label clustering, the set coherence test and comparisons to manual analyses performed in prior free response studies are provided in Chapter 6.

6. An example FE lexicon, containing the communicative FEs from two crowdsourced labeling studies. Two test sets of FE images were generated by fixing the parameters (i.e., activation weights) of a subset of AUs and generating all variants for a specified AU count on the avatar. The images went through a two-stage process in which they are first assigned binary labels denoting whether or not the FE conveys a meaningful signal by crowd workers, as described in Chapter 1. For FEs that passed the first stage, free-response labels were assigned by crowd workers and processed using the pipeline described in Chapter 5. Results of the data analysis are provided in Chapter 7 in the form of an FE lexicon, with FE images mapped to descriptive single word labels and weighted AU combinations. An example lexicon entry is shown in Figure 7.1.

Indexing (document details)
Advisor: Perlin, Kenneth
Commitee: Togelius, Julian, Burleson, Winslow, Cohn, Jeffrey
School: New York University Tandon School of Engineering
Department: Computer Science and Engineering
School Location: United States -- New York
Source: DAI-A 82/9(E), Dissertation Abstracts International
Subjects: Computer science, Psychology, Linguistics
Keywords: Computer graphics, Emotion, Facial Expression, Generative design, Natural language processing, Social signal processing, Facial Action Coding System, Action Units
Publication Number: 28157617
ISBN: 9798597014760
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy