Graphics Processing Units (GPUs) are a fast evolving architecture. Over the last decade their programmability has been harnessed to solve non-graphics tasks—in many cases at a huge performance advantage to CPUs. Unlike CPUs, GPUs have always been a highly parallel architecture—thousands of lightweight execution contexts are necessary to keep the chip busy. CPUs too have become parallel over the same time period. But having started from the opposite end of the spectrum they expose parallelism of a different nature, where only tens of heavyweight execution contexts are active at any point in time. As parallel architectures become increasingly common, GPUs provide a research vehicle for developing parallel algorithms, programming models, and languages. This dissertation describes a library, CUDA Data Parallel Primitives Library (CUDPP), of building blocks called primitives for efficiently solving a broad range of problems on the GPU. CUDPP has become one of the most used libraries for data-parallel programming on GPUs thus showing the applicability of these building blocks. The most basic of the primitives is scan and its segmented variant, both of which have a rich history in the literature of data-parallel programming. I describe multiple efficient implementations for both variants. Subsequently I look at a class of problems exhibiting nested parallelism that have traditionally proved challenging for many-core processors. For each instance, segmented scan based primitives efficiently solve such problems for the first time on the GPU. In terms of complexity, sort and hash, fundamental algorithms in computer science, prove to be quite challenging to efficiently implement on the GPU. I take a detailed look at both radix sort and a class of comparison sorts. I compare between two different merge strategies that were used on parallel processors in the past and discuss their efficiency on GPUs. I show how sort is used to develop the first efficient algorithm for building spatial hierarchies on the GPU, thus showing how building spatial hierarchies is, in essence, sorting in three dimensions. The penultimate chapter describes the first efficient hash algorithm for GPUs based on cuckoo hashing. I close by looking at two implementations of the set data structure.
|Commitee:||Hamann, Bernd, Joy, Ken|
|School:||University of California, Davis|
|School Location:||United States -- California|
|Source:||DAI-B 72/05, Dissertation Abstracts International|
|Keywords:||Algorithms, Data-parallel primitives, Gpgpu, Gpu, Many-core architectures|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be