With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
COMING SOON! PQDT Open is getting a new home!
ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at www.proquest.com.
Questions? Please refer to this FAQ.
In the era of big data, many cluster platforms and resource management schemes are created to satisfy the increasing demands on processing a large volume of data. A general setting of big data processing jobs consists of multiple stages, and each stage represents generally defined data operation such as filtering and sorting. To parallelize the job execution in a cluster, each stage includes a number of identical tasks that can be concurrently launched at multiple servers. Practical clusters often involve hundreds or thousands of servers processing a large batch of jobs. Resource management, that manages cluster resource allocation and job execution, is extremely critical for the system performance.
Generally speaking, there are three main challenges in resource management of the new big data processing systems. First, while there are various pending tasks from different jobs and stages, it is difficult to determine which ones deserve the priority to obtain the resources for execution, considering the tasks' different characteristics such as resource demand and execution time. Second, there exists dependency among the tasks that can be concurrently running. For any two consecutive stages of a job, the output data of the former stage is the input data of the later one. The resource management has to comply with such dependency. The third challenge is the inconsistent performance of the cluster nodes. In practice, run-time performance of every server is varying. The resource management needs to dynamically adjust the resource allocation according to the performance change of each server.
The resource management in the existing platforms and prior work often rely on fixed user-specific configurations, and assumes consistent performance in each node. The performance, however, is not satisfactory under various workloads. This dissertation aims to explore new approaches to improving the efficiency of large-scale big data processing platforms. In particular, the run-time dynamic factors are carefully considered when the system allocates the resources. New algorithms are developed to collect run-time data and predict the characteristics of jobs and the cluster. We further develop resource management schemes that dynamically tune the resource allocation for each stage of every running job in the cluster. New findings and techniques in this dissertation will certainly provide valuable and inspiring insights to other similar problems in the research community.
Advisor: | Sheng, Bo |
Commitee: | Ding, Wei, Sheng, Bo, Simovici, Dan, Zhang, Honggang |
School: | University of Massachusetts Boston |
Department: | Computer Science |
School Location: | United States -- Massachusetts |
Source: | DAI-B 78/10(E), Dissertation Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Computer science |
Keywords: | Big data, Performance, Scheduling, System |
Publication Number: | 10262281 |
ISBN: | 978-1-369-81702-7 |