Future high performance computing (HPC) systems will face unique problems, including high power consumption and severe network contention. Both power and the network are shared resources; while individual jobs can optimize their use of these resources, we will realize greater benefits if we optimize them across all running jobs. Accordingly, this dissertation presents inter-job optimization strategies to limit power consumption and to mitigate network contention.
One way to reduce HPC power consumption is to enforce a fixed power limit for running jobs. However, HPC applications do not consume constant power over their lifetimes. Thus, applications that are assigned a fixed power bound may be forced to slow down during high-power computation phases, but may not consume their full power allocation during low-power I/O phases. This dissertation explores algorithms that leverage application characteristics—phase frequency, duration and power needs—to shift unused power from applications in I/O phases to applications in computation phases, thus improving system-wide performance. We design novel techniques that include explicit staggering of applications to improve power shifting. Compared to executing without power shifting, our algorithms can improve average performance by up to 8% or improve performance of a single, high-priority application by up to 32%.
We also investigate the use of Quality of Service (QoS) mechanisms to reduce the negative impact of network contention. QoS allows users to manage resource sharing between network flows and to provide bandwidth guarantees to specific flows. Our results show that applying QoS at the job level significantly reduces the impact of contention on high priority jobs, but it degrades the performance of other jobs and reduces overall throughput. However, applying QoS at the process level improves performance for specific jobs up to 40%, and in some cases it completely eliminates the impact of contention. It achieves these improvements with limited negative impact on other jobs; any job that experiences performance loss typically degrades less than 5%, often much less.
The inter-job optimizations presented in this dissertation improve power and network management on HPC systems. Current and future systems can employ these techniques to enhance their performance and efficiency.
|Advisor:||Lowenthal, David K.|
|Commitee:||Strout, Michelle, Zhang, Beichuan, de Supinski, Bronis R., Mohror, Kathryn|
|School:||The University of Arizona|
|School Location:||United States -- Arizona|
|Source:||DAI-B 81/2(E), Dissertation Abstracts International|
|Keywords:||High-performance computing, Network, Performance, Power, Quality-of-service|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be