Dissertation/Thesis Abstract

Inferring Human Personality from Written Media
by Wright, William R., Ph.D., University of Hawai'i at Manoa, 2020, 90; 27834758
Abstract (Summary)

This work explores the association between human personality and language features consisting of sequences of tokens. My work reveals that there are such features that are predictive of personality over multiple corpora taken from different populations of English speakers. I gathered written text authored by 50 individuals who participated on a bodybuilding web forum (the Forum corpus). Also I administered a personality questionnaire following the protocol provided by the International Personality Item Pool (IPIP). For comparison across other populations I also obtained text corpora from three other research groups, along with the results of personality assessments: the EAR corpora consisting of transcripts of the speech of 96 participants as they go about their daily lives, Essays written by 2,588 undergraduates at the University of Texas and posts by 244 Facebook users. After performing part-of-speech (POS) tagging on the text for all the participants in these corpora, I extracted unigrams, bigrams and trigrams (n-grams) of tokens and their POS tags, and counted every word/tag permutation that appeared.

I considered only features appearing one or more times per 1000 words in the Forum corpus because there was not enough data to consider sparser features. I found 766 such features. From among those features I explored which were relevant across both my Forum corpus and at least one of the borrowed corpora, since those are the most promising, robust features that illustrate the possibility of building models across various corpora using the same language features. 75 of the features were associated with one or more personality dimensions across both the Forum corpus and at least one additional corpus. I devised explanations as to why some of the features are correlated with a given personality dimension. That task establishes that although some of the features may have arisen randomly, one can confidently proceed with the conclusion that English speakers consistently express their personalities through their language usage. In addition, to show that it is possible to use these features for prediction, I generated multiple linear regression models for each corpora-personality dimension combination; in the best case (Openness with the Forum corpus) I obtained R2 of 0.686 and S (standard error of the estimate) of 0.561. My work sets a foundation for more robust, accurate models of personality. I hope that others will find additional principled explanations of why the features I found are associated with personality. In the future I anticipate that suitable language-analytical techniques will deepen insight both in the case of English speakers and speakers of additional world languages.

Indexing (document details)
Advisor: Chin, David
Commitee: Hayashi, Kentaro, Kuh, Anthony, Ogawa, Michael-Brian, O'Grady, William, Robertson, Scott
School: University of Hawai'i at Manoa
Department: Computer Science
School Location: United States -- Hawaii
Source: DAI-A 82/1(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science, Psychology, Linguistics
Keywords: Sequences of tokens, Part-of-speech tagging, Human personality
Publication Number: 27834758
ISBN: 9798662433069
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest