Dissertation/Thesis Abstract

Search and ranking in semantically rich applications
by Stoyanovich, Julia, Ph.D., Columbia University, 2010, 220; 3400637
Abstract (Summary)

This thesis proposes novel search and ranking approaches for semantically rich application domains.

The central role of Data Management in today's society may be compared to the role of Physics in early 20th Century when it entered its Golden Age. Data is the raw matter of the Universe of Information, and, in a process analogous to nuclear fusion, data is transformed progressively into information, and then into knowledge.

The advent of the World Wide Web as an information exchange platform and a social medium, both on an unprecedented scale, raises the user's expectations with respect to the availability and ease of access to relevant information. Web users build persistent online personas: they provide information about themselves in stored profiles, register their relationships with other users, and express their preferences with respect to information and products. As a result, rich semantic information about the user is readily available, or can be derived, and can be used to improve the user's online experience, making him more productive, more creative, and better entertained online. There is thus a need for context-aware data management mechanisms that support a user-centric data exploration experience, and do so efficiently on the large scale.

In a complementary trend, scientific domains, most notably the domain of life sciences, are experiencing unprecedented growth. The ever-increasing amount of data and knowledge requires the development of new semantically rich data management techniques that facilitate system-wide analysis and scientific collaboration. Literature search is a central task in scientific research. Controlled vocabularies and ontologies that exist in this domain present an opportunity for improving the quality of ranking.

The Web is a multifaceted medium that gives users access to a wide variety of datasets, and satisfies diverse information needs. Some Web users look for answers to specific questions, while others browse content and explore the richness of possibilities. The notion of relevance is intrinsically linked with preference and choice. Individual items and item collections are characterized in part by the semantic relationships that hold among values of their attributes. Exposing these semantic relationships helps the user gain a better understanding of the dataset, allowing him to make informed choices. This process is commonly known as data exploration, and has applications that range from analyzing the performance of the stock market, to identifying genetic disease susceptibility, to looking for a date.

In this thesis we propose novel search and ranking techniques that improve the user experience and facilitate information discovery in several semantically rich application domains. We show how the social context in social tagging sites can be used for user-centric information discovery. We also propose novel ontology-aware search and ranking techniques, and apply them to scientific literature search. Finally, we address data exploration in ranked structured datasets, and propose a rank-aware clustering algorithm that uses semantic relationships among item attributes to facilitate information discovery.

Indexing (document details)
Advisor: Ross, Kenneth A.
Commitee:
School: Columbia University
School Location: United States -- New York
Source: DAI-B 71/03, Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Data management, Ranking, Social networks, Web searches
Publication Number: 3400637
ISBN: 9781109673234
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest