One of the most difficult and least studied sources of structural ambiguity in text is syntactic coordination. Coordination resolution, for this dissertation, is the task of determining the correct conjuncts of the coordinating conjunctions “and” and “or” and is explored here for biomedical scientific literature. It is a challenging problem because conjunctions are highly promiscuous with respect to the kinds of words and phrases that they are willing to coordinate. For example, a conjunct may consist of a single word such as an adjective or a much longer verb phrase. The main contribution of this work is an efficient and accurate coordination resolution algorithm that outperforms the previous state-of-the-art on this task and a state-of-the-art syntactic parser when applied to this task. The algorithm uses binary classifiers to predict conjunct boundaries. One of the more interesting features that improved the performance of these classifiers leverages probabilities generated by a language model which is built using large quantities of readily available unlabeled data. The language model derived features exploit the intuition that sentences containing coordinating conjunctions can often be rephrased as two or more smaller sentences derived from the coordination structure. Candidate sentences corresponding to different possible coordination structures are generated and compared using the language model to help determine which coordination structure is best. Performance is further improved by first predicting the syntactic type of the coordination structure and using this type prediction to help train and classify conjunct boundaries. Finally, a system that integrates the new approach with a syntactic parser is shown to outperform either approach in isolation.
|Advisor:||Hunter, Lawrence E.|
|Commitee:||Martin, James H., Nielsen, Rodney D., Palmer, Martha, Ward, Wayne H.|
|School:||University of Colorado at Boulder|
|School Location:||United States -- Colorado|
|Source:||DAI-B 72/07, Dissertation Abstracts International|
|Keywords:||Machine learning, Natural language processing, Syntactic coordination|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be