Datacenters are evolving to host heterogeneous Big Data workloads on shared clusters to reduce the operational cost and achieve higher resource utilization. However, it is challenging to schedule heterogeneous workloads with diverse resource requirements and QoS constraints. For example, when consolidating latency critical jobs and best-effort batch jobs in the same cluster, latency critical jobs may suffer from long queuing delay if their resource requests cannot be met immediately; while best-effort jobs would suffer from killing overhead when preempted. Moreover, resource contention may harm task performance running on worker nodes. Since resource requirements for diverse applications show heterogeneity and is not known before task execution, either the cluster manager has to over-provision resources for all incoming applications resulting in low cluster utilization; or applications may experience performance slowdown or even failure due to resource insufficiency. Existing approaches focus on either application awareness or system awareness and fail to address the semantic gap between the application layer and the system layer (e.g., OS scheduling mechanisms or cloud resource allocators).
To address these issues, we propose to attack these problems from a different angle. In other words, applications and underlying systems should cooperate synergistically. This this way, the resource demands of application can be exposed to the system. At the same time, application schedulers can be assisted with more runtimes of the system layer and perform more dedicated scheduling. However, the system and application co-design is challenging. First, the real resource demands for an application is hard to be predicted since its requirements vary during its lifetime. Second, there are tons of information generated from system layers (e.g., OS process schedulers or hardware counters), from which it is hard to associate these information to a dedicated task. Fortunately, with the help of lightweight virtualization, applications could run in isolated containers such that system level runtime information can be collected at the container level. The rich APIs of container based virtualization also enable to perform more advanced scheduling.
In this thesis, we focus on efficient and scalable techniques in datacenter scheduling by leveraging lightweight virtualization. Our thesis is two folds. First, we focus on profiling and optimizing the performance of Big Data applications. In this aspect, we built a tool to trace the scheduling delay for low-latency online data analytics workloads. We further built a map execution engine to address the performance heterogeneity for MapReduce. Second, we focus on leveraging OS containers to build advanced cluster scheduling mechanisms. In that, we built a preemptive cluster scheduler, an elastic memory manager and an OOM killer for Big Data applications. We also conducted a supplementary research on tracing the performance of Big Data training on TensorFlow.
We conducted extensive evaluations of the proposed projects in a real-world cluster. The experimental results demonstrate the effectiveness of proposed approaches in terms of improving performance and utilization of Big Data clusters.
|Commitee:||Boult, Terry, Chang, Sang-Yoon, Chow, Edward, Rao, Jia|
|School:||University of Colorado Colorado Springs|
|Department:||Engineering and Applied Science-Computer Science|
|School Location:||United States -- Colorado|
|Source:||DAI-B 80/09(E), Dissertation Abstracts International|
|Keywords:||Big Data, Cluster scheduling, Hadoop, OS container, Resource management, Spark|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be