Many articles in the online encyclopedia Wikipedia have hyperlinks to ambiguous article titles. To improve the reader experience, any link to an ambiguous title should be replaced with a link to one of the unambiguous meanings. We propose a novel statistical topic model, which we refer to as the Link Text Topic Model (LTTM), that can suggest new link targets for existing ambiguous links in Wikipedia articles. For evaluation, we develop a method for extracting ground truth from snapshots of Wikipedia at different points in time. We evaluate LTTM on this ground truth, and demonstrate its superiority over existing link- and content-based approaches. Finally, we build a web service that uses LTTM to suggest unambiguous articles for human editors wanting to fix ambiguous links.
|Advisor:||Getoor, Lise C.|
|Commitee:||Boyd-Graber, Jordan, Daume, Hal, III|
|School:||University of Maryland, College Park|
|School Location:||United States -- Maryland|
|Source:||MAI 50/04M, Masters Abstracts International|
|Subjects:||Web Studies, Information science, Computer science|
|Keywords:||Disambiguation, Link prediction, Topic modeling, Wikipedia|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be