For many years, improvements to CMOS process technologies fueled rapid growth in processor performance and throughput. Each process generation brought exponentially more transistors and exponentially reduced the per-transistor switching power. However, concerns over leakage currents have moved us out of the classical CMOS scaling regime. Although the number of available transistors continues to rise, their switching power no longer declines. In contrast to transistor counts, power budgets remain fixed due to limitations on cooling or battery life. Thus, with each new process generation, an exponentially decreasing fraction of the available transistors can be simultaneously switched. The growing divide between available transistors and utilizable transistors leads to a utilization wall.
This dissertation characterizes the utilization wall and proposes conservation cores as a means of surmounting its most pressing challenges. Conservation cores, or C-Cores, are application-specific hardware circuits created to reduce energy consumption on computationally-intensive applications with complex control logic and irregular memory access patterns. C-Cores are drop-in replacements for existing source code, and make use of limited reconfigurability to adapt to software changes over time. The design and implementation of these specialized execution engines pose challenges with respect to code selection, automatic synthesis, choice of programming model, longevity/robustness, and system integration.
This dissertation addresses many of these challenges through the development of an automated conservation core toolchain. The toolchain automatically extracts the key kernels from a target workload and uses a custom C-to-silicon infrastructure to generate 45 nm implementations of the C-Cores. C-Cores employ a new pipeline design technique called pipeline splitting, or pipesplitting. This technique reduces clock power, increases memory parallelism, and further exploits operation-level parallelism. C-Cores also incorporate specialized energy-efficient per-instruction data caches called cachelets into the datapath, which allow for sub-cycle cache-coherent memory accesses.
An evaluation of C-Cores against an efficient in-order processor shows that C-Cores speed up the code they target by 1.5×, improve EDP by 6.9× and accelerate the whole application by 1.33× on average, while reducing application energy-delay by 57%.
|Advisor:||Swanson, Steven, Taylor, Michael|
|Commitee:||Buckwalter, James, Larson, Lawrence, Tullsen, Dean|
|School:||University of California, San Diego|
|Department:||Computer Science and Engineering|
|School Location:||United States -- California|
|Source:||DAI-B 71/11, Dissertation Abstracts International|
|Subjects:||Computer Engineering, Computer science|
|Keywords:||Conservation cores, Energy consumption, Utilization wall|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be