Although software testing has been well-studied in computer science, it has received little attention in natural language processing. Nonetheless, a fully developed methodology for glass box evaluation and testing of language processing applications already exists in the field methods of descriptive linguistics. This work lays out a number of experiments that in the aggregate demonstrate the feasibility of software testing or glass box evaluation for natural language processing, and in the process validates the claim that the techniques of descriptive linguistics and field methods are a sound methodological approach to doing such testing. Various chapters consider the issue from the perspectives of the application of fieldwork techniques to software testing, applications of linguistics-informed software engineering to NLP, applications of the descriptive linguistics concept of complementary distribution to problems in NLP, and applications of descriptive linguistics concepts to the problem of quality assurance for semantic representations in proposition banks.
In the experiment that most clearly shows the connection between linguistic fieldwork and software testing, a test suite that is constructed like a field linguist's elicitation schedule is used to find performance errors in five named entity recognition programs and to predict the performance of one program on several equivalence classes of named entities. In another experiment, from the software engineering perspective, a linguistically-informed fault model is used to isolate the source of a performance anomaly in a language processing application. In three subsequent experiments, a discovery procedure for minimal pairs and free variation is used to approach a problem in the normalization of named entities and a discovery procedure for complementary distribution is used to diagnose problematic semantic representations. The latter technique is applied to two corpora and two sets of predicate-argument structures; it is shown that the technique labels true positives with an accuracy of 69%.
|Advisor:||Palmer, Martha Stone|
|Commitee:||Hirschman, Lynette, Hunter, Lawrence, Martin, James H., Rood, David|
|School:||University of Colorado at Boulder|
|School Location:||United States -- Colorado|
|Source:||DAI-A 71/06, Dissertation Abstracts International|
|Subjects:||Linguistics, Bioinformatics, Computer science|
|Keywords:||Descriptive linguistics, Evaluation, Field linguistics, Glass box testing, Natural language, Natural language processing, Software testing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be