This thesis proposes novel search and ranking approaches for semantically rich application domains.
The central role of Data Management in today's society may be compared to the role of Physics in early 20th Century when it entered its Golden Age. Data is the raw matter of the Universe of Information, and, in a process analogous to nuclear fusion, data is transformed progressively into information, and then into knowledge.
The advent of the World Wide Web as an information exchange platform and a social medium, both on an unprecedented scale, raises the user's expectations with respect to the availability and ease of access to relevant information. Web users build persistent online personas: they provide information about themselves in stored profiles, register their relationships with other users, and express their preferences with respect to information and products. As a result, rich semantic information about the user is readily available, or can be derived, and can be used to improve the user's online experience, making him more productive, more creative, and better entertained online. There is thus a need for context-aware data management mechanisms that support a user-centric data exploration experience, and do so efficiently on the large scale.
In a complementary trend, scientific domains, most notably the domain of life sciences, are experiencing unprecedented growth. The ever-increasing amount of data and knowledge requires the development of new semantically rich data management techniques that facilitate system-wide analysis and scientific collaboration. Literature search is a central task in scientific research. Controlled vocabularies and ontologies that exist in this domain present an opportunity for improving the quality of ranking.
The Web is a multifaceted medium that gives users access to a wide variety of datasets, and satisfies diverse information needs. Some Web users look for answers to specific questions, while others browse content and explore the richness of possibilities. The notion of relevance is intrinsically linked with preference and choice. Individual items and item collections are characterized in part by the semantic relationships that hold among values of their attributes. Exposing these semantic relationships helps the user gain a better understanding of the dataset, allowing him to make informed choices. This process is commonly known as data exploration, and has applications that range from analyzing the performance of the stock market, to identifying genetic disease susceptibility, to looking for a date.
In this thesis we propose novel search and ranking techniques that improve the user experience and facilitate information discovery in several semantically rich application domains. We show how the social context in social tagging sites can be used for user-centric information discovery. We also propose novel ontology-aware search and ranking techniques, and apply them to scientific literature search. Finally, we address data exploration in ranked structured datasets, and propose a rank-aware clustering algorithm that uses semantic relationships among item attributes to facilitate information discovery.
|Advisor:||Ross, Kenneth A.|
|School Location:||United States -- New York|
|Source:||DAI-B 71/03, Dissertation Abstracts International|
|Keywords:||Data management, Ranking, Social networks, Web searches|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be