Robustness is a fundamental and timeless issue, and it remains vital to all aspects of computation systems, regardless of specific computation platforms, architectures, and algorithm design. The issue is also timely: modern computing systems are increasingly built on unreliable substrates. This thesis designs reliable computing techniques for distributed systems, circuits and networks.
We primarily study techniques inspired from coding theory to address the robustness issues such as system elasticity, stragglers (slow workers), machine failures and soft errors, by carefully weaving redundancy into the data and the design of the algorithm. We primarily focus on three aspects of coding-based computation techniques. The first aspect is to design adaptive computing techniques in largescale systems that are elastic, i.e., the number of machines can change with time. The second is to make the coding-based computation schemes cater to the specific needs of iterative and multi-stage algorithms, such as power iterations and gradient descent that are essential for today’s data analytics. The third aspect is to study the fundamental limits of information propagation in computation networks, provide information-theoretic outer bounds on error accumulation, and design in-network computing schemes to compensate for this error accumulation, e.g., in a circuit network where each computation component can be unreliable.
This thesis presents theoretical results that are fundamental to the understanding of computation systems, and opportunities for radical improvements, e.g. in computation load, communication overhead, and storage cost. We also present real-system implementations and real-data experiments and show our progress on taking these theoretical results closer to practice. The academic results presented in this thesis advance on classical results in the historical field of reliable distributed computing, while simultaneously addressing timely issues arising in today’s computing systems.
|Advisor:||Grover, Pulkit, Kar, Soummya|
|Commitee:||Grover, Pulkit, Hassibi, Babak, Interlandi, Matteo, Kar, Soummya, Kovacevic, Jelena, Moura, Jose|
|School:||Carnegie Mellon University|
|Department:||Electrical and Computer Engineering|
|School Location:||United States -- Pennsylvania|
|Source:||DAI-B 80/11(E), Dissertation Abstracts International|
|Keywords:||Coded computing, Distributed system, Elastic computing, Iterative algorithm|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be