Citation sentences (sentences that cite other papers) play a key role in the summarization of scientific articles. However, a citation-based summarization system that depends on generic natural language processing components, such as parsers or sentence compressors, will perform poorly if those components cannot handle citations correctly.
In this thesis, I examine the effect of citation handling on parsing, sentence compression, and multi-document summarization. There are two types of citations that occur in citation sentences: constituent citations and parenthetical citations. I propose an automatic citation classifier based on training data created through Mechanical Turk tasks. I demonstrate that the use of type-specific citation handling as pre-processing improves the performance of a state-of-the-art generic parser, both for quality of the parse trees and running time. Extrinsic evaluations demonstrate that improving the performance of a parser on citation sentences in turn improves the performance of a sentence compressor, Trimmer (Zajic et al., 2007), and a multi-document summarization system, MASCS, according to several summarization measures.
|Advisor:||Dorr, Bonnie, Zajic, David|
|Commitee:||Daume, Hal, III|
|School:||University of Maryland, College Park|
|School Location:||United States -- Maryland|
|Source:||MAI 51/03M(E), Masters Abstracts International|
|Keywords:||Citation, Multi-document summarization, Parsing, Sentence compression|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be