This dissertation details contributions made by the author to the field of computer science while working to improve the performance of a state-of-the-art linear algebra library intended for use on commodity platforms. These platforms and libraries have become a de facto standard for most scientific researchers. There are three significant contributions to the field. The superblock family of summation algorithms generalizes the dot product and its error analysis, and allows floating point error to be controlled with a desired target bound and performance loss and with workspace determined at compile time. It demonstrates empirically that theoretical error bounds are good predictors of actual errors. The opposite had been asserted by leading experts in the field. The Master Last algorithm identifies and corrects a heretofore unknown performance drain in multi-core parallelism, Parallel Management Overhead (PMO). The author reduced PMO by as much as two orders of magnitude, and sped up QR and LU matrix factorizations (heavily used by researchers) as much as 65%, and asymptotically as much as 45%. Finally, in the Parallel Cache Assignment algorithm, the author realized the goal of exploiting the “collective cache” of all the cores in a multi-core system for a tightly coupled algorithm with high data dependencies requiring extreme synchronization. This allowed parallelization of some heavily used (and heavily researched) operations in linear algebra, panel factorizations, without changing the flop counts or arithmetic, a feat that had defied parallel researchers for many years. The result was superlinear speedup (as much as 19 times faster for an eight core system) for the QR panel factorization over the previous state of the art serial implementation.
|Advisor:||Whaley, Richard C.|
|Commitee:||Chronopolous, Anthony, Maynard, Hugh, Tavernini, Lucio, Wenk, Carola|
|School:||The University of Texas at San Antonio|
|School Location:||United States -- Texas|
|Source:||DAI-B 72/02, Dissertation Abstracts International|
|Keywords:||Cache, Error reduction, Master last, Multi-core, Parallelism, Superblocks|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be