Dissertation/Thesis Abstract

Light-Weight Virtualization Driven Runtimes for Big Data Applications
by Chen, Wei, Ph.D., University of Colorado Colorado Springs, 2019, 194; 13862451
Abstract (Summary)

Datacenters are evolving to host heterogeneous Big Data workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. For example, when consolidating latency critical jobs and best-effort batch jobs in the same cluster, latency critical jobs may suffer from long queuing delay if their resource requests cannot be met immediately; while best-effort jobs would suffer from killing overhead when preempted. Moreover, resource contention may harm task performance running on worker nodes. Since resource requirements for diverse applications show heterogeneity and is not known before task execution, either the cluster manager has to over-provision resources for all incoming applications resulting in low cluster utilization; or applications may experience performance slowdown or even failure due to resource insufficiency. Existing approaches focus on either application awareness or system awareness and fail to address the semantic gap between the application layer and the system layer (e.g., OS scheduling mechanisms or cloud resource allocators).

To address these issues, we propose to attack these problems from a different angle. In other words, applications and underlying systems should cooperate synergistically. This this way, the resource demands of application can be exposed to the system. At the same time, application schedulers can be assisted with more runtimes of the system layer and perform more dedicated scheduling. However, the system and application co-design is challenging. First, the real resource demands for an application is hard to be predicted since its requirements vary during its lifetime. Second, there are tons of information generated from system layers (e.g., OS process schedulers or hardware counters), from which it is hard to associate these information to a dedicated task. Fortunately, with the help of lightweight virtualization, applications could run in isolated containers such that system level runtime information can be collected at the container level. The rich APIs of container based virtualization also enable to perform more advanced scheduling.

In this thesis, we focus on efficient and scalable techniques in datacenter scheduling by leveraging lightweight virtualization. Our thesis is two folds. First, we focus on profiling and optimizing the performance of Big Data applications. In this aspect, we built a tool to trace the scheduling delay for low-latency online data analytics workloads. We further built a map execution engine to address the performance heterogeneity for MapReduce. Second, we focus on leveraging OS containers to build advanced cluster scheduling mechanisms. In that, we built a preemptive cluster scheduler, an elastic memory manager and an OOM killer for Big Data applications. We also conducted a supplementary research on tracing the performance of Big Data training on TensorFlow.

We conducted extensive evaluations of the proposed projects in a real-world cluster. The experimental results demonstrate the effectiveness of proposed approaches in terms of improving performance and utilization of Big Data clusters.

Indexing (document details)
Advisor: Zhou, Xiaobo
Commitee: Boult, Terry, Chang, Sang-Yoon, Chow, Edward, Rao, Jia
School: University of Colorado Colorado Springs
Department: Engineering and Applied Science-Computer Science
School Location: United States -- Colorado
Source: DAI-B 80/09(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: Big Data, Cluster scheduling, Hadoop, OS container, Resource management, Spark
Publication Number: 13862451
ISBN: 978-1-392-11835-1
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy