Natural language generation involves the automatic formulation of natural language sentences. The ultimate goal in NLG is for the computer to produce language that is as meaningful, fluent and idiomatic as that produced by humans. A typical NLG system will include components for selecting and structuring the content to be generated (content planning), assigning content units to sentences (sentence planning) and realizing each content unit in a particular language (surface realization). Although there are surface realizers that can produce fluent output, very little research has been done on adjunct ordering. Adjuncts are either ordered in the surface realizer's input, or all possibilities are generated and the alternatives are ranked using a language model.
In this thesis, I address modifier and adjunct ordering in trainable surface realization for English. First, I describe a chart-based surface realizer I have implemented that uses a probabilistic feature-based tree adjoining grammar extracted automatically from a training corpus. My surface realizer takes input logical forms and performs insertion of function words, word and constituent ordering, and morphological inflection. Its performance is comparable to that of other trainable surface realizers that take the same type of input.
Second, I present a set of experiments in which I compare different approaches to word and constituent ordering for several tasks: the dative and benefactive alternations, ordering of prenominal adjectives, ordering of adverbials, ordering of prepositional phrases, and ordering of adjuncts in general. I compare a classifier-based approach with rich feature sets to two simple relative frequency approaches (one using head words as the only the feature, the other using part-of-speech tags as the only feature). I experimented with lexical, syntactic, semantic, pragmatic, sentence and frequency features in my classifier-based approach. I present evaluations of these approaches in terms of classification accuracy and improvement in performance of the surface realizer. I show that the classification-based approach to word and adjunct ordering improves on the simple relative frequency based approach. I also show that different feature sets are useful for different tasks and genres, with (in general) pragmatic features being more important for dialog and syntactic features for text.
Finally, I describe a human evaluation of my initial surface realizer with relative frequency approaches to ordering of adjuncts and modifiers, and of my modified surface realizer which uses a classification-based approach to ordering of adjuncts and modifiers. I show that modeling of adjunct and modifier ordering can lead to small but significant improvements in surface realization performance, and analyze the types of errors identified by the human evaluators.
|School:||State University of New York at Stony Brook|
|School Location:||United States -- New York|
|Source:||DAI-B 71/05, Dissertation Abstracts International|
|Keywords:||Adjunct ordering, English generation, Modifier ordering, Natural language generation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be