While modern GPU graph analytics libraries provide usable programming models and good single-node performance, the memory size and the computation power of a single GPU is still too limited for analyzing large graphs. Scaling graph analytics is challenging, however, because of the characteristics of graph applications: irregular computation, their low computation to communication ratios, and limited communication bandwidth on multi-GPU platforms. Addressing these challenges while still maintaining programmability is yet another difficulty.
In this work, I target the scalability of graph analytics to multiple GPUs. I begin by targeting multiple GPUs within a single node. Compared to GPU clusters, single-node-multi-GPU platforms are easier to manage and program, but can still act as good development environments for multi-GPU graph processing. My work targets several aspects of multi-GPU graph analytics: the inputs that graph application programmers provide to the multi-GPU framework; how the graph should be distributed across GPUs; the interaction between local computation and remote communication; what and when to communicate; how to combine received and local data; and when the application should stop. I answer these questions by extending the Gunrock graph analytics framework for a single GPU to multiple GPUs, showing that most graph applications scale well in my system. I also show that direction-optimizing breadth-first search (DOBFS) is the most difficult scaling challenge because of its extremely low compute to communication ratio.
To address the DOBFS scaling challenge, I demonstrate a DOBFS implementation with efficient graph representation, local computation, and remote communication, based on the idea of separating high- and low-degree vertices. I particularly target communication costs, using global reduction with bit masks on high-degree vertices and point-to-point communication to low-degree vertices. This greatly reduces overall communication cost and results in good DOBFS scaling with log-scale graphs on more than a hundred GPUs in the Sierra early access system (the testing bed for the Sierra Supercomputer).
Next, I revisit the design choices I made for the single-node multi-GPU framework in view of recent hardware and software developments, such as better peer GPU access and unified virtual memory. I analyze 9 newly developed complex graph applications for the DARPA HIVE program, implemented in the Gunrock framework, and show a wide range of potential scalabilities. More importantly, the questions of when and how to do communication are more diverse than those in the single-node framework. With this analysis, I conclude that future multi-GPU frameworks, whether single- or multiple-node, need to be more flexible: instead of only communicating at iteration boundaries, they should support a more flexible, general communication model. I also propose other research directions for future heterogeneous graph processing, including asynchronous computation and communication, specialized graph representation, and heterogenous processing.
|Advisor:||Owens, John D.|
|Commitee:||Amenta, Nina, Akella, Venkatesh|
|School:||University of California, Davis|
|Department:||Electrical and Computer Engineering|
|School Location:||United States -- California|
|Source:||DAI-B 81/3(E), Dissertation Abstracts International|
|Subjects:||Computer Engineering, Computer science|
|Keywords:||BFS, Breadth-first search, Distributed graph processing, GPU, Multi GPU, Parallel graph processing|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be