Dissertation/Thesis Abstract

Coordination Resolution in Biomedical Texts
by Ogren, Philip Victor, Ph.D., University of Colorado at Boulder, 2011, 123; 3453768
Abstract (Summary)

One of the most difficult and least studied sources of structural ambiguity in text is syntactic coordination. Coordination resolution, for this dissertation, is the task of determining the correct conjuncts of the coordinating conjunctions “and” and “or” and is explored here for biomedical scientific literature. It is a challenging problem because conjunctions are highly promiscuous with respect to the kinds of words and phrases that they are willing to coordinate. For example, a conjunct may consist of a single word such as an adjective or a much longer verb phrase. The main contribution of this work is an efficient and accurate coordination resolution algorithm that outperforms the previous state-of-the-art on this task and a state-of-the-art syntactic parser when applied to this task. The algorithm uses binary classifiers to predict conjunct boundaries. One of the more interesting features that improved the performance of these classifiers leverages probabilities generated by a language model which is built using large quantities of readily available unlabeled data. The language model derived features exploit the intuition that sentences containing coordinating conjunctions can often be rephrased as two or more smaller sentences derived from the coordination structure. Candidate sentences corresponding to different possible coordination structures are generated and compared using the language model to help determine which coordination structure is best. Performance is further improved by first predicting the syntactic type of the coordination structure and using this type prediction to help train and classify conjunct boundaries. Finally, a system that integrates the new approach with a syntactic parser is shown to outperform either approach in isolation.

Indexing (document details)
Advisor: Hunter, Lawrence E.
Commitee: Martin, James H., Nielsen, Rodney D., Palmer, Martha, Ward, Wayne H.
School: University of Colorado at Boulder
Department: Computer Science
School Location: United States -- Colorado
Source: DAI-B 72/07, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Machine learning, Natural language processing, Syntactic coordination
Publication Number: 3453768
ISBN: 9781124621265
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest