Dissertation/Thesis Abstract

Trainable English generation with modifier and adjunct ordering
by Zhong, Huayan, Ph.D., State University of New York at Stony Brook, 2009, 212; 3406711
Abstract (Summary)

Natural language generation involves the automatic formulation of natural language sentences. The ultimate goal in NLG is for the computer to produce language that is as meaningful, fluent and idiomatic as that produced by humans. A typical NLG system will include components for selecting and structuring the content to be generated (content planning), assigning content units to sentences (sentence planning) and realizing each content unit in a particular language (surface realization). Although there are surface realizers that can produce fluent output, very little research has been done on adjunct ordering. Adjuncts are either ordered in the surface realizer's input, or all possibilities are generated and the alternatives are ranked using a language model.

In this thesis, I address modifier and adjunct ordering in trainable surface realization for English. First, I describe a chart-based surface realizer I have implemented that uses a probabilistic feature-based tree adjoining grammar extracted automatically from a training corpus. My surface realizer takes input logical forms and performs insertion of function words, word and constituent ordering, and morphological inflection. Its performance is comparable to that of other trainable surface realizers that take the same type of input.

Second, I present a set of experiments in which I compare different approaches to word and constituent ordering for several tasks: the dative and benefactive alternations, ordering of prenominal adjectives, ordering of adverbials, ordering of prepositional phrases, and ordering of adjuncts in general. I compare a classifier-based approach with rich feature sets to two simple relative frequency approaches (one using head words as the only the feature, the other using part-of-speech tags as the only feature). I experimented with lexical, syntactic, semantic, pragmatic, sentence and frequency features in my classifier-based approach. I present evaluations of these approaches in terms of classification accuracy and improvement in performance of the surface realizer. I show that the classification-based approach to word and adjunct ordering improves on the simple relative frequency based approach. I also show that different feature sets are useful for different tasks and genres, with (in general) pragmatic features being more important for dialog and syntactic features for text.

Finally, I describe a human evaluation of my initial surface realizer with relative frequency approaches to ordering of adjuncts and modifiers, and of my modified surface realizer which uses a classification-based approach to ordering of adjuncts and modifiers. I show that modeling of adjunct and modifier ordering can lead to small but significant improvements in surface realization performance, and analyze the types of errors identified by the human evaluators.

Indexing (document details)
Advisor: Stent, Amanda
School: State University of New York at Stony Brook
School Location: United States -- New York
Source: DAI-B 71/05, Dissertation Abstracts International
Subjects: Computer science
Keywords: Adjunct ordering, English generation, Modifier ordering, Natural language generation
Publication Number: 3406711
ISBN: 978-1-109-73874-2
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy