The task of automatic machine translation (MT) is the focus of a huge variety of active research efforts, both because of the intrinsic utility of this difficult task, and the theoretical and linguistic insights that arise from modeling relationships between natural languages. However, MT systems that leverage syntactic information are only recently becoming practical, and in a typical system of this sort, syntactic information is generated by monolingual parsers; the task of explicitly modeling syntactic relationships between target and source languages is yet to be fully explored.
This thesis investigates the problem of finding syntactic parse trees of target and/or source sentences that are more appropriate for use in a syntactic MT system. Two basic methodologies are explored.
First, we present a sequence of two statistical models that leverage bilingual information to improve the linguistic quality of syntactic parses, as measured by their ability to replicate human-generated gold-standard annotations. The first model uses word to word alignments as an external source of information, while the second models the alignments jointly. These models are both quite effective at improving the intrinsic quality of the parse trees, and the second model additionally improves word alignment performance. However, while the two models achieve similar parsing improvements, we find that improving parses in conjunction with word alignments is much more helpful for the downstream machine translation task.
In the next part of the thesis, we explore this finding further by investigating the effects on MT performance of agreement between parse trees and word alignments. We present a simple method for transforming input trees in a way that ignores gold-standard annotations, concentrating instead on improving syntactic agreement directly. In experiments, we find that though we obviously lose fidelity to more linguistically informed treebank annotation guidelines, this transformation-based approach yields the strongest improvements in syntactic machine translation.
|Commitee:||Darrell, Trevor, Griffiths, Thomas L., Klein, Dan|
|School:||University of California, Berkeley|
|School Location:||United States -- California|
|Source:||DAI-B 74/07(E), Dissertation Abstracts International|
|Subjects:||Linguistics, Computer science|
|Keywords:||Artificial intelligence, Bilingual corpora, Machine translation, Natural language processing, Statistical parsing, Syntactic machine translation, Syntax|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be