The goal of this thesis is to provide strategies and perspectives to easily and quickly generate layouts of wide superscalar processors, relying almost entirely on automated synthesis and place-and-route, where the generated layouts are competitive with the manual physical design.
Our approach is to focus on physical design aspects at the early microarchitecture design stage so that the subsequent automated synthesis and place-and-route flow can take advantage of it. That is, we design microarchitectures for which the quality gap between automated and manual physical design flows is lessened so that automation actually delivers a high-quality physical layout. This thesis introduces this new design paradigm, called Design for Competitive Automated Layout (DCAL) of Superscalar Processors. In this work, we have been exploring design strategies at multiple levels where automated synthesis and place-and-route flows traditionally perform poorly.
At present, DCAL targets two key design levels: circuit-level and microarchitecture-level DCAL. In circuit-level DCAL, (1) we improve the automated layout quality of highly-ported memories in superscalar processors. These memories are pervasive in a superscalar microarchitecture and account for many of the cycle-critical and energy-critical paths, as well as much of the core area. Our research makes a case for standard-cell based SRAMs (flip-flops, muxes, clock buffers) as the solution to the problem of highly-ported deep-submicron memories. we explore the costs of adding ports in two different multi-ported SRAM implementation styles: full-custom 6T SRAM, scaled-up for multiple ports) versus fully synthesized (D flip-flop based).
In microarchitecture-level DCAL, (2) we explore a clustered microarchitecture that can improve power, performance, and area (PPA) metrics of layouts generated by automated synthesis and place-and-route. We implemented an RTL design of a clustered microarchitecture and took that design through physical layout to accurately model wire delay and power. We propose a modular design approach to clustered microarchitectures where modules can be reused to build a large number of clusters. Our focus is on improving the efficiency of the physical implementation of clustered architectures.
Additionally, (3) we perform the design space evaluation of the Trace Processor, an execution paradigm whose goal is to address the concerns of poor performance, power, and frequency scalability of superscalar processors. We adapted the modern RISC-V ISA for the implementation and used SPEC2006 benchmarks for exploration. We optimize the Trace Processor microarchitecture to reduce the size of critical structures such as Trace Cache, Oustanding Trace Buffer, the write ports of GRFs, thereby demonstrating the DCAL characteristic.
|School:||North Carolina State University|
|School Location:||United States -- North Carolina|
|Source:||DAI-B 80/01(E), Dissertation Abstracts International|
|Keywords:||Automated, Competitive, Dcal, Layout, Processors, Superscalar|
Copyright in each Dissertation and Thesis is retained by the author. All Rights Reserved
The supplemental file or files you are about to download were provided to ProQuest by the author as part of a
dissertation or thesis. The supplemental files are provided "AS IS" without warranty. ProQuest is not responsible for the
content, format or impact on the supplemental file(s) on our system. in some cases, the file type may be unknown or
may be a .exe file. We recommend caution as you open such files.
Copyright of the original materials contained in the supplemental file is retained by the author and your access to the
supplemental files is subject to the ProQuest Terms and Conditions of use.
Depending on the size of the file(s) you are downloading, the system may take some time to download them. Please be