Dissertation/Thesis Abstract

Paradigms of evaluation in natural language processing: Field linguistics for glass box testing
by Cohen, Kevin Bretonnel, Ph.D., University of Colorado at Boulder, 2010, 175; 3404044
Abstract (Summary)

Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of experiments that in the aggregate demonstrate the feasibility of software testing or glass box evaluation for natural language processing, and in the process validates the claim that the techniques of descriptive linguistics and field methods are a sound methodological approach to doing such testing. Various chapters consider the issue from the perspectives of the application of fieldwork techniques to software testing, applications of linguistics-informed software engineering to NLP, applications of the descriptive linguistics concept of complementary distribution to problems in NLP, and applications of descriptive linguistics concepts to the problem of quality assurance for semantic representations in proposition banks.

In the experiment that most clearly shows the connection between linguistic fieldwork and software testing, a test suite that is constructed like a field linguist's elicitation schedule is used to find performance errors in five named entity recognition programs and to predict the performance of one program on several equivalence classes of named entities. In another experiment, from the software engineering perspective, a linguistically-informed fault model is used to isolate the source of a performance anomaly in a language processing application. In three subsequent experiments, a discovery procedure for minimal pairs and free variation is used to approach a problem in the normalization of named entities and a discovery procedure for complementary distribution is used to diagnose problematic semantic representations. The latter technique is applied to two corpora and two sets of predicate-argument structures; it is shown that the technique labels true positives with an accuracy of 69%.

Indexing (document details)
Advisor: Palmer, Martha Stone
Commitee: Hirschman, Lynette, Hunter, Lawrence, Martin, James H., Rood, David
School: University of Colorado at Boulder
Department: Linguistics
School Location: United States -- Colorado
Source: DAI-A 71/06, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Linguistics, Bioinformatics, Computer science
Keywords: Descriptive linguistics, Evaluation, Field linguistics, Glass box testing, Natural language, Natural language processing, Software testing
Publication Number: 3404044
ISBN: 9781109782226
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest