Natural Language Generation (NLG) is the process of generating natural language text from an input, which is a communicative goal and a database or knowledge base. Informally, the architecture of a standard NLG system consists of the following modules (Reiter and Dale, 2000): content determination, sentence planning (or microplanning) and surface realization. This thesis is about designing novel, linguistically motivated features for surface realization (the final NLG module mentioned above), the process by which text is created from an abstract representation of language according to the rules of syntax and morphology. It primarily involves three interrelated problems: constituent ordering, inflection and agreement and function word insertion. For addressing these problems, most state-of-the-art realization ranking models (Velldal and Oepen, 2005; White and Rajkumar,2009) employ features which are based on very basic insights from linguistic theory (POS tags, rules derived from parse trees, for example). More sophisticated insights of linguistic theory have not been widely perceived as necessary for increased system performance, with very basic insights providing the most gains (similar to the situation Johnson (2009) describes in the context of natural language parsing).
In contrast, our goal is to design features motivated by insights from theoretical linguistics and also based on cognitively plausible accounts of language comprehension discussed in the linguistics literature, so that the realization ranking model can better approximate human judgements of fluency and acceptability. We show that the minimal dependency length theory (Gibson, 1998; Temperley, 2007) helps with the constituent ordering problem in surface realization. For the problem of generating correct inflected word forms, we demonstrate that a machine learning-based approach is well-suited to encode insights from the theoretical linguistics literature on English agreement (Kathol, 1999; Pollard and Sag,1994). This approach leads to improvements over a competitive baseline model containing n-gram and parsing features (of the kind described in Johnson, 2009). Finally, we demonstrate empirically that the uniform information density principle discussed in (Jaeger, 2010) contributes towards the that-complementizer choice in the context of surface realization.
|Commitee:||Culicover, Peter, Schuler, William|
|School:||The Ohio State University|
|School Location:||United States -- Ohio|
|Source:||DAI-A 78/11(E), Dissertation Abstracts International|
|Keywords:||CCG, Natural language generation, Realization ranking|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be