COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Linguistically Motivated Features for CCG Realization Ranking
by Rajkumar, Rajakrishnan, Ph.D., The Ohio State University, 2012, 167; 10631306
Abstract (Summary)

Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface realization. This thesis is about designing novel, linguistically motivated features for surface realization (the final NLG module mentioned above), the process by which text is created from an abstract representation of language according to the rules of syntax and morphology. It primarily involves three interrelated problems: constituent ordering, inflection and agreement and function word insertion. For addressing these problems, most state-of-the-art realization ranking models (Velldal and Oepen, 2005; White and Rajkumar,2009) employ features which are based on very basic insights from linguistic theory (POS tags, rules derived from parse trees, for example). More sophisticated insights of linguistic theory have not been widely perceived as necessary for increased system performance, with very basic insights providing the most gains (similar to the situation Johnson (2009) describes in the context of natural language parsing).

In contrast, our goal is to design features motivated by insights from theoretical linguistics and also based on cognitively plausible accounts of language comprehension discussed in the linguistics literature, so that the realization ranking model can better approximate human judgements of fluency and acceptability. We show that the minimal dependency length theory (Gibson, 1998; Temperley, 2007) helps with the constituent ordering problem in surface realization. For the problem of generating correct inflected word forms, we demonstrate that a machine learning-based approach is well-suited to encode insights from the theoretical linguistics literature on English agreement (Kathol, 1999; Pollard and Sag,1994). This approach leads to improvements over a competitive baseline model containing n-gram and parsing features (of the kind described in Johnson, 2009). Finally, we demonstrate empirically that the uniform information density principle discussed in (Jaeger, 2010) contributes towards the that-complementizer choice in the context of surface realization.

Indexing (document details)
Advisor: White, Michael
Commitee: Culicover, Peter, Schuler, William
School: The Ohio State University
Department: Linguistics
School Location: United States -- Ohio
Source: DAI-A 78/11(E), Dissertation Abstracts International
Subjects: Linguistics
Keywords: CCG, Natural language generation, Realization ranking
Publication Number: 10631306
ISBN: 978-0-355-01567-6
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy