Dissertation/Thesis Abstract

Collective Relational Data Integration with Diverse and Noisy Evidence
by Memory, Alexander, Ph.D., University of Maryland, College Park, 2019, 204; 27547597
Abstract (Summary)

Driven by the growth of the Internet, online applications, and data sharing initiatives, available structured data sources are now vast in number. There is a growing need to integrate these structured sources to support a variety of data science tasks, including predictive analysis, data mining, improving search results, and generating recommendations.

A particularly important integration challenge is dealing with the heterogeneous structures of relational data sources. In addition to the large number of sources, the difficulty also lies in the growing complexity of sources, and in the noise and ambiguity present in real-world sources. Existing automated integration approaches handle the number and complexity of sources, but nearly all are too brittle to handle noise and ambiguity. Corresponding progress has been made in probabilistic learning approaches to handle noise and ambiguity in inputs, but until recently those technologies have not scaled to the size and complexity of relational data integration problems. My dissertation addresses key challenges arising from this gap in existing approaches.

I begin the dissertation by introducing a common probabilistic framework for reasoning about both metadata and data in integration problems. I demonstrate that this approach allows us to mitigate noise in metadata. The type of transformation I generate is particularly rich–taking into account multi-relational structure in both the source and target databases. I introduce a new objective for selecting this type of relational transformation and demonstrate its effectiveness on particularly challenging problems in which only partial outputs to the target are possible. Next, I present a novel method for reasoning about ambiguity in integration problems and show it handles complex schemas with many alternative transformations. To discover transformations beyond those derivable from explicit source and target metadata, I introduce an iterative mapping search framework. In a complementary approach, I introduce a framework for reasoning jointly over both transformations and underlying semantic attribute matches, which are allowed to have uncertainty. Finally, I consider an important case in which multiple sources need to be fused but traditional transformations aren’t sufficient. I demonstrate that we can learn statistical transformations for an important practical application with the multiple sources problem.

Indexing (document details)
Advisor: Getoor, Lise
Commitee: Nau, Dana, Corrada Bravo, Héctor, Raschid, Louiqa, Ritter, Alan
School: University of Maryland, College Park
Department: Computer Science
School Location: United States -- Maryland
Source: DAI-B 81/8(E), Dissertation Abstracts International
Source Type: DISSERTATION
Subjects: Computer science
Keywords: Data integration, Probabilistic reasoning, Schema mapping, Structured prediction
Publication Number: 27547597
ISBN: 9781392770764
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy
ProQuest