In this work, a model of computation for shared memory parallelism is presented. To address fundamental constraints of modern memory systems, the presented model constrains how parallelism interacts with memory access patterns and in doing so provides a method for design and analysis of algorithms that estimates reliable execution time based on a few architectural parameters. This model is presented as an alternative to modern thread based models that focus on computational concurrency but rely on reactive hardware policies to hide and amortize memory latency. Since modern processors use reactive mechanisms and heuristics to deduce the data access requirement of computations, the memory access costs of these threaded programs may be difficult to predict reliably. This research presents the Queue Streaming Model (QSM) that aims to address these shortcomings by providing a prescriptive mechanism to achieve latency-amortized and predictable-cost data access. Further, the work presents application of the QSM to algorithms commonly used in a number of applications. These algorithms include structured regular computations represented by merge sort, unstructured irregular computations represented by sparse matrix dense vector multiplication, and dynamic computations represented by MapReduce. The analysis of these algorithms reveal architectural tradeoffs between memory system bottlenecks and algorithm design. The techniques described in this dissertation reveal a general software approach that could be used to construct more general irregular applications, provided they can be transformed into a relational query form. It demonstrates that the QSM can be used to design algorithms that enhance utilization of memory system resources by structuring concurrency and memory accesses such that system bandwidths are balanced and latency is amortized. Finally, the benefit of applying the QSM algorithm to the Euler inviscid flow solver is demonstrated through experiments on the Intel® Xeon® E5-2680 v2 processor using ten cores. The transformation produced a speed-up of 25% over an optimized OpenMP implementation having identical computational structure.
|Advisor:||Luke, Edward A.|
|Commitee:||Bhushan, Shanti, Janus, J. Mark, Kim, Seongjai, Young, Maxwell|
|School:||Mississippi State University|
|School Location:||United States -- Mississippi|
|Source:||DAI-B 80/09(E), Dissertation Abstracts International|
|Keywords:||Bridging model of computation, Performance portability, Performance predictability, Queue streaming model, Queue streaming processor, Streaming access|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be