COMING SOON! PQDT Open is getting a new home!

ProQuest Open Access Dissertations & Theses will remain freely available as part of a new and enhanced search experience at

Questions? Please refer to this FAQ.

Dissertation/Thesis Abstract

Parallelism and error reduction in a high performance environment
by Castaldo, Anthony M., Ph.D., The University of Texas at San Antonio, 2010, 117; 3433204
Abstract (Summary)

This dissertation details contributions made by the author to the field of computer science while working to improve the performance of a state-of-the-art linear algebra library intended for use on commodity platforms. These platforms and libraries have become a de facto standard for most scientific researchers. There are three significant contributions to the field. The superblock family of summation algorithms generalizes the dot product and its error analysis, and allows floating point error to be controlled with a desired target bound and performance loss and with workspace determined at compile time. It demonstrates empirically that theoretical error bounds are good predictors of actual errors. The opposite had been asserted by leading experts in the field. The Master Last algorithm identifies and corrects a heretofore unknown performance drain in multi-core parallelism, Parallel Management Overhead (PMO). The author reduced PMO by as much as two orders of magnitude, and sped up QR and LU matrix factorizations (heavily used by researchers) as much as 65%, and asymptotically as much as 45%. Finally, in the Parallel Cache Assignment algorithm, the author realized the goal of exploiting the “collective cache” of all the cores in a multi-core system for a tightly coupled algorithm with high data dependencies requiring extreme synchronization. This allowed parallelization of some heavily used (and heavily researched) operations in linear algebra, panel factorizations, without changing the flop counts or arithmetic, a feat that had defied parallel researchers for many years. The result was superlinear speedup (as much as 19 times faster for an eight core system) for the QR panel factorization over the previous state of the art serial implementation.

Indexing (document details)
Advisor: Whaley, Richard C.
Commitee: Chronopolous, Anthony, Maynard, Hugh, Tavernini, Lucio, Wenk, Carola
School: The University of Texas at San Antonio
Department: Computer Science
School Location: United States -- Texas
Source: DAI-B 72/02, Dissertation Abstracts International
Subjects: Computer science
Keywords: Cache, Error reduction, Master last, Multi-core, Parallelism, Superblocks
Publication Number: 3433204
ISBN: 978-1-124-38509-9
Copyright © 2021 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy