Dissertation/Thesis Abstract

Optimizing Data Movement in Hybrid Analytic Systems
by Leyshock, Patrick Michael, Ph.D., Portland State University, 2014, 228; 3670195
Abstract (Summary)

Hybrid systems for analyzing big data integrate an analytic tool and a dedicated data-management platform, storing data and operating on the data at both components. While hybrid systems have benefits over alternative architectures, in order to be effective, data movement between the two hybrid components must be minimized. Extant hybrid systems either fail to address performance problems stemming from inter-component data movement, or else require the user to explicitly reason about and manage data movement. My work presents the design, implementation, and evaluation of a hybrid analytic system for array-structured data that automatically minimizes data movement between the hybrid components.

The proposed research first motivates the need for automatic data-movement minimization in hybrid systems, demonstrating that under workloads whose inputs vary in size, shape, and location, automation is the only practical way to reduce data movement. I then present a prototype hybrid system that automatically minimizes data movement. The exposition includes salient contributions to the research area, including a partial semantic mapping between hybrid components, the adaptation of rewrite-based query transformation techniques to minimize data movement in array-modeled hybrid systems, and empirical evaluation of the approach's utility. Experimental results not only illustrate the hybrid system's overall effectiveness in minimizing data movement, but also illuminate contributions made by various elements of the design.

Indexing (document details)
Advisor: Maier, David
Commitee: Jones, Mark P., Monsere, Christopher M., Tufte, Kristin
School: Portland State University
Department: Computer Science
School Location: United States -- Oregon
Source: DAI-B 76/05(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Array analytics, Big data, Hybrid systems, Query optimization, R, SciDB
Publication Number: 3670195
ISBN: 9781321461886
Copyright © 2019 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy