Dissertation/Thesis Abstract

Inter-job Optimization in High Performance Computing
by Savoie, Lee Hilton, Ph.D., The University of Arizona, 2019, 111; 22617929
Abstract (Summary)

Future high performance computing (HPC) systems will face unique problems, including high power consumption and severe network contention. Both power and the network are shared resources; while individual jobs can optimize their use of these resources, we will realize greater benefits if we optimize them across all running jobs. Accordingly, this dissertation presents inter-job optimization strategies to limit power consumption and to mitigate network contention.

One way to reduce HPC power consumption is to enforce a fixed power limit for running jobs. However, HPC applications do not consume constant power over their lifetimes. Thus, applications that are assigned a fixed power bound may be forced to slow down during high-power computation phases, but may not consume their full power allocation during low-power I/O phases. This dissertation explores algorithms that leverage application characteristics—phase frequency, duration and power needs—to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance. We design novel techniques that include explicit staggering of applications to improve power shifting. Compared to executing without power shifting, our algorithms can improve average performance by up to 8% or improve performance of a single, high-priority application by up to 32%.

We also investigate the use of Quality of Service (QoS) mechanisms to reduce the negative impact of network contention. QoS allows users to manage resource sharing between network flows and to provide bandwidth guarantees to specific flows. Our results show that applying QoS at the job level significantly reduces the impact of contention on high priority jobs, but it degrades the performance of other jobs and reduces overall throughput. However, applying QoS at the process level improves performance for specific jobs up to 40%, and in some cases it completely eliminates the impact of contention. It achieves these improvements with limited negative impact on other jobs; any job that experiences performance loss typically degrades less than 5%, often much less.

The inter-job optimizations presented in this dissertation improve power and network management on HPC systems. Current and future systems can employ these techniques to enhance their performance and efficiency.

Indexing (document details)
Advisor: Lowenthal, David K.
Commitee: Strout, Michelle, Zhang, Beichuan, de Supinski, Bronis R., Mohror, Kathryn
School: The University of Arizona
Department: Computer Science
School Location: United States -- Arizona
Source: DAI-B 81/2(E), Dissertation Abstracts International
Subjects: Computer science
Keywords: High-performance computing, Network, Performance, Power, Quality-of-service
Publication Number: 22617929
ISBN: 9781085673150
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy