With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
COMING SOON! PQDT Open is getting a new home!
ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at www.proquest.com.
Questions? Please refer to this FAQ.
Citation sentences (sentences that cite other papers) play a key role in the summarization of scientific articles. However, a citation-based summarization system that depends on generic natural language processing components, such as parsers or sentence compressors, will perform poorly if those components cannot handle citations correctly.
In this thesis, I examine the effect of citation handling on parsing, sentence compression, and multi-document summarization. There are two types of citations that occur in citation sentences: constituent citations and parenthetical citations. I propose an automatic citation classifier based on training data created through Mechanical Turk tasks. I demonstrate that the use of type-specific citation handling as pre-processing improves the performance of a state-of-the-art generic parser, both for quality of the parse trees and running time. Extrinsic evaluations demonstrate that improving the performance of a parser on citation sentences in turn improves the performance of a sentence compressor, Trimmer (Zajic et al., 2007), and a multi-document summarization system, MASCS, according to several summarization measures.
Advisor: | Dorr, Bonnie, Zajic, David |
Commitee: | Daume, Hal, III |
School: | University of Maryland, College Park |
Department: | Computer Science |
School Location: | United States -- Maryland |
Source: | MAI 51/03M(E), Masters Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Computer science |
Keywords: | Citation, Multi-document summarization, Parsing, Sentence compression |
Publication Number: | 1529438 |
ISBN: | 978-1-267-72627-8 |