Dissertation/Thesis Abstract

Using n-grams to identify time periods of cultural influence
by Knight, Gregory P., M.S., East Carolina University, 2012, 92; 1532278
Abstract (Summary)

An author's literary style is influenced by the cultural time period in which the author lives. The author's ideas, and the words chosen to express them, can help identify the cultural time period that most influenced the author.

Ideas are expressed in language through sequences of words called n-grams. Over the past several years, Google has been engaged in digitizing millions of books. As part of this endeavor, Google has created a database of n-grams extracted from these digitized books, and has made the database available to researchers online. This is the first time ever that such an extensive repository of cultural data has been made available.

This study develops and tests an original method for utilizing Google's database to identify the cultural time period that most influenced the author of a published work. Several undisputed literary works are examined, from which sets of n-grams are extracted and compared against the Google database. The frequency and distribution of n-gram matches allow us to determine the cultural time period that most influenced the author. The method is also tested against several literary works having uncertain or disputed authorship and period of composition.

The results suggest that the method developed provides a reasonable approximation of the time period of greatest cultural influence for each book. Unexpectedly, the results tend to support conclusions reached by another researcher with regard to prior literary influences on the Ern Malley Poems. In addition, they lend support to a well-known alternate theory on the authorship of the Book of Mormon.

Indexing (document details)
Advisor: Tabrizi, M. H. Nassehzadeh
Commitee: Abrahamson, Karl, Ding, Junhua, Gemperline, Paul J., Tabrizi, M. H. Nassehzadeh, Vilkomir, Sergiy
School: East Carolina University
Department: Software Engineering
School Location: United States -- North Carolina
Source: MAI 51/04M(E), Masters Abstracts International
Source Type: DISSERTATION
Subjects: Linguistics, Sociology, Computer science
Keywords: Authorship, Culture, Documents, Forgery, Google, N-gram
Publication Number: 1532278
ISBN: 9781267872388
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest