Dissertation/Thesis Abstract

SRAM reliability improvement using ECC and circuit techniques
by McCartney, Mark P., Ph.D., Carnegie Mellon University, 2014, 131; 3733956
Abstract (Summary)

Reliability is of the utmost importance for safety of electronic systems built for the automotive, industrial, and medical sectors. In these systems, the embedded memory is especially sensitive due to the large number of minimum-sized devices in the cell arrays. Memory failures which occur after the manufacture-time burn-in testing phase are particularly difficult to address since redundancy allocation is no longer available and fault detection schemes currently used in industry generally focus on the cell array while leaving the peripheral logic vulnerable to faults. Even in the cell array, conventional error control coding (ECC) has been limited in its ability to detect and correct failures greater than a few bits, due to the high latency or area overhead of correction [43]. Consequently, improvements to conventional memory resilience techniques are of great importance to continued reliable operation and to counter the raw bit error rate of the memory arrays in future technologies at economically feasible design points [11, 36, 37, 53, 56, 70].

In this thesis we examine the landscape of design techniques for reliability, and introduce two novel contributions for improving reliability with low overhead.

To address failures occurring in the cell array, we have implemented an erasure-based ECC scheme (EB-ECC) that can extend conventional ECC already used in memory to correct and detect multiple erroneous bits with low overhead. An important component of this scheme is the method for detecting erasures at runtime; we propose a novel ternary-output sense amplifier design which can reduce the risk of undetected read latency failures in small-swing bitline designs.

While most study has focused on the static random access memory (SRAM) cell array, for high-reliability products, it is important to examine the effects of failures on the peripheral logic as well. We have designed a wordline assertion comparator (WLAC) which has lower area overhead in large cache designs than competing techniques in the literature to detect address decoder failure.

[11] B. H. Calhoun, Yu Cao, Xin Li, Ken Mai, L. T. Pileggi, R. A. Rutenbar, and K. L. Shepard. Digital Circuit Design Challenges and Opportunities in the Era of Nanoscale CMOS. Proc. IEEE, 96(2):343–365, 2008. doi: 10.1109/JPROC.2007.911072. (document)

[36] Jangwoo Kim, Mark McCartney, Ken Mai, and Babak Falsafi. Modeling SRAM Failure Rates to Enable Fast, Dense, Low-Power Caches. IEEE Workshop on Silicon Errors in Logic - System Effects, 24–25 March 2009. (document), 2.3.1

[37] Jangwoo Kim, Hyunggyun Yang, Mark P. McCartney, Mudit Bhargava, Ken Mai, and Babak Falsafi. Building fast, dense, low-power caches using erasure-based inline multibit ECC. In Dependable Computing (PRDC), 2013 IEEE 19th Pacific Rim International Symposium on, pages 98–107, 2013. doi: 10.1109/PRDC.2013.19. URL http: //ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=6820845. (document), 2.3.1, 4.2, 4.5

[43] S. Lin and D.J. Costello. Error Control Coding: Fundamentals and Applications. Prentice Hall, 1983. (document), 3.4, 3.4.2

[53] R. Naseer and J. Draper. Parallel double error correcting code design to mitigate multibit upsets in SRAMs. In Proceedings of 34th European Solid-State Circuits Conference ESSCIRC 2008, pages 222–225, 2008. doi: 10.1109/ESSCIRC.2008.4681832. (document), 3.1, 3.2, 3.5

[56] S. Ozdemir, D. Sinha, G. Memik, J. Adams, and H. Zhou. Yield-aware cache architectures. In IEEE/ACM International Symposium on Microarchitecture , pages 15–25, December 2006. doi: 10.1109/MICRO.2006.52. (document)

[70] M. Spica and T.M. Mak. Do we need anything more than single bit error correction (ECC)? In Proceedings of Records of the 2004 International Workshop on Memory Technology, Design and Testing, pages 111–116, 9–10 August 2004. doi: 10.1109/MTDT.2004.1327993. (document)

Indexing (document details)
Advisor: Mai, Ken
Commitee: Hoe, James, Hoefler, Alexander, Strojwas, Andrzej
School: Carnegie Mellon University
Department: Electrical and Computer Engineering
School Location: United States -- Pennsylvania
Source: DAI-B 77/04(E), Dissertation Abstracts International
Subjects: Computer Engineering, Electrical engineering
Keywords: Cyber physical systems, ECC, Erasure, Memory, Reliability, SRAM
Publication Number: 3733956
ISBN: 978-1-339-22380-3
Copyright © 2020 ProQuest LLC. All rights reserved. Terms and Conditions Privacy Policy Cookie Policy