Bayesian networks (BNs) are an effective tool in a diverse range of applications that require representation and reasoning with uncertain knowledge and data. Mathematically, a Bayesian network (BN) is a type of statistical model that can compactly represent complex probability distributions of random variables. Inference over BNs can be either exact or approximate. The junction tree algorithm is perhaps the most popular exact inference algorithm. The junction tree algorithm propagates beliefs (or posteriors) over a junction tree when a new piece of evidence comes in so that the factors over the junction tree stay consistent with each other.
However, belief propagation over junction trees is known to be computationally hard. The cornputational complexity hinders the application of BNs in cases where real-time inference is required. This thesis accelerates the junction tree belief propagation algorithm through parallelizing and using a state of the art manycore computing platform. Recent manycore computing platforms, like the recent Graphical Processing Units (GPUs) from NVIDIA and Intel's Knights Ferry, employ a Single Instruction Multiple Data (SIMD) architecture. They can provide massive computational power to address various computational problems. However, it is not trivial to map a problem instance to the manycore platform. The problem has to be carefully parallelized and the implementation has to be carefully optimized in order to make full use of the computation resources on a GPU.
In this thesis, we will thoroughly investigate the junction tree algorithm and identify the possible parallel opportunities. Our work mainly focuses on node-level parallelizati on, i.e., we parallelize each message passing of junction tree belief propagation. We first identify two kinds of parallelism in the message passing, namely, element-wise parallelism and arithmetic parallelism. Then, based on these two kinds parallelism, we propose a two-dimensional parallel message passing algorithm. In case of message passings that do not contain enough parallel opportunity, we also propose a junction tree pruning techniques called clique merging. Clique merging eliminates extremely small nodes in a junction tree so that the junction tree is better adjusted to the GPU platform.
Performance tuning for the parallel junction tree algorithm is necessary. Yet the diversity in size typically seen in the junction tree cliques makes the GPU input workload vary; the two dimensions of parallelism further add to the complexity of this tuning issue. Therefore it is hard to set the GPU configuration parameters for optimal performance. In this thesis, we parameterize the GPU tuning process and propose a machine learning framework to automatically optimize the performance. This framework essentially trains a machine learning model to capture causal relationships between a GPUs workload, its configuration, and resulting performance characteristics. The machine learning model is then used to optimize the GPU configuration. Experiments show that this auto-tuning framework can improve the performance significantly more than extensive manual tuning.
We implemented the parallel junction tree algorithm on a NVIDIA GTX 460 GPU card, and employed the clique merging and auto-tuning techniques. Compared with a benchmark sequential code that runs on an Intel Core 2 Quad CPU, the fully tuned code show up to 19.90x speedup. On average over all data-sets, we get a speedup of 10.70x (arithmetic average) or 8.68x (geometric average).
|Commitee:||Franchetti, Franz, Yan, Rong, Zhang, Joy|
|School:||Carnegie Mellon University|
|Department:||Electrical and Computer Engineering|
|School Location:||United States -- Pennsylvania|
|Source:||DAI-B 75/02(E), Dissertation Abstracts International|
|Keywords:||GPUs, Junction trees, Machine learning, Parallel junctions|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be