With PQDT Open, you can read the full text of open access dissertations and theses free of charge.
About PQDT Open
Search
In a world of data deluge, considerable computational power is necessary to derive knowledge from the mountains of raw data which surround us. This trend mandates the use of various parallelization techniques and runtimes to perform such analyses in a meaningful period of time. The information retrieval community has introduced a programming model and associated runtime architecture under the name of MapReduce, and it has demonstrated its applicability to several major operations performed by and within this community. Our initial research demonstrated that, although the applicability of MapReduce is limited to applications with fairly simple parallel topologies, with a careful set of extensions, the programming model can be extended to support more classes of parallel applications; in particular, this holds true for the class of Composable Applications.
This thesis presents our experiences in identifying a set of extensions for the MapReduce programming model, which expands its applicability to more classes of applications, including the iterative MapReduce computations; we have also developed an efficient runtime architecture, named Twister, that supports this new programming model. The thesis also includes a detailed discussion about mapping applications and their algorithms to MapReduce and its extensions, as well as performance analyses of those applications which compare different MapReduce runtimes. The discussions of applications demonstrates the applicability of the Twister runtime for large scale data analyses, while the empirical evaluations prove the scalability and the performance advantages one can gain from using Twister.
Advisor: | Fox, Geoffrey |
Commitee: | Gannon, Dennis, Leake, David, Lumsdaine, Andrew |
School: | Indiana University |
Department: | Computer Sciences |
School Location: | United States -- Indiana |
Source: | DAI-B 72/03, Dissertation Abstracts International |
Source Type: | DISSERTATION |
Subjects: | Computer science |
Keywords: | Composable, Data intensive, Distributed computing, Mapreduce, Parallel computing, Programming models, Scalable computing |
Publication Number: | 3439561 |
ISBN: | 978-1-124-44771-1 |