Advances in syntactic parsing and semantic role labeling have been a boon to Natural Language Processing. However, they perform poorly with sentences that do not conform to expected syntax-semantic patterning behavior. For example, in the sentence "The crowd laughed the clown off the stage", a verb of non-verbal communication laugh is coerced into the semantics of a caused motion construction (CMC) and gains a motion entailment that is atypical given its inherent lexical semantics. Accurate semantic role labeling for such sentences requires that NLP classifiers accurately identify these coerced usages in data. Given accurate semantic role labels, the sentence would also require a semantic interpretation with appropriate representations that include the semantics of the CMCs.
This thesis focuses on the definition, identification, and representation of the CMCs. We expand on the work from Construction Grammar to develop the semantic types and varieties of CMCs for corpus annotation. Utilizing the annotation as the training and test data, we train automatic CMC classifiers and demonstrate that CMCs can be reliably identified in the corpus data. Furthermore, we develop a new set of semantic predicates in VerbNet for the semantic representation of CMCs. These predicates will provide for the representation of CMC sentences, but also give VerbNet a more consistent explicit representation for paths of motion. Finally, we demonstrate that CMC representation can help give the proper semantic representation to sentences even when the verb in the sentence does not include the semantics of CMC.
The overall contribution of this work is the establishment of the processes involved in identifying and representing constructions in an empirical setting. This work assesses the necessary steps to define and annotate constructions in a corpus setting, train classifiers for constructions, and represent the semantics of constructions through VerbNet predicates. While we have focused on the identification and representation of caused motion constructions, a similar corpus-driven study can be conducted for other constructions whose sentence representations would not be possible with the semantics of the verb alone.
|Commitee:||Martin, James H., Michaelis, Laura A., Narasimhan, Bhuvana, Zaenen, Annie|
|School:||University of Colorado at Boulder|
|School Location:||United States -- Colorado|
|Source:||DAI-A 76/06(E), Dissertation Abstracts International|
|Subjects:||Linguistics, Computer science|
|Keywords:||Caused motion construction, Computational linguistics, Construction grammar, Corpus linguistics, Lexical semantics, Semantic representation|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be