### Ultra-Low-Power SRAM Design In High Variability Advanced CMOS

Prof. Pinaki Mazumder **University of Michigan** mazum@eecs.umich.edu Ann Arbor, MI 48109

Mobile computing



Embedded/handheld





16kB SRAM cache ARM1176JZ

6MB SRAM L2, 64kB RF L1

TOWOT: 20%

Intel Core 2 (Penryn)

POWOT: 33%

16kB SRAM cache Custom MSP430

TO NOT: 09%

## Taxonomy of Semiconductor Wemories



Courtesy: Harris & Weste

## Comparison of Semiconductor Memories

|  |      |          |   |          |     | Speed     |      |
|--|------|----------|---|----------|-----|-----------|------|
|  | Good | Volatik  | 3 | Jow      | Jow | Very fast | SRAN |
|  | 320  | Volatile |   |          |     |           |      |
|  |      |          |   | Very low |     | Very slow |      |

**Courtesy: Harris & Weste** 

## Architecture of Static RAM



#### Advantages:

- 1. Shorter wires within blocks
- 2. Block address activates only 1 block => power savings

Courtesy: Harris & Weste

## READ and WRITE Operations in SRAIM

- output after a change in address.
- t<sub>ACS</sub> (access time for chip select): time for stable output after CS is asserted.
- □ t<sub>oE</sub> (output enable time): time for low impedance when OE and CS are both asserted.
- □ t<sub>oz</sub> (output-disable time): time to high-impedance state when OE or CS are negated.
- after a change to the address inputs.



#### Read Memory Cycle



#### Write Memory Cycle



Courtesy: Harris & Weste

### **SRAM Trade-offs**



Small-Signal Non-Strobed Regenerative Sensing, Naveen Verma, Student Member, IEEE, and Anantha P. Chandrakasan, Fellow, IEEE IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 44, NO. 1, JANUARY 2009 163, A High-Density 45 nm SRAM Using App

Trends: • High-V, devices

Trends: • High-V<sub>t</sub> devices, low V<sub>MIN</sub>

Medium bit-cells, short bit-lines

Small bit-cells, long bit-lines

## Key existing and emerging applications for biomedical devices

| Application        | Perfo                | Performance Specification   | ation           |
|--------------------|----------------------|-----------------------------|-----------------|
|                    | Power                | Processor                   | Energy          |
|                    |                      |                             | Source          |
| Pacemaker &        | $<10~\mu \mathrm{W}$ | IKHz DSP                    | 10-year         |
| Cardioverter-      |                      |                             | time battery    |
|                    |                      |                             |                 |
| Hearing aid &      | $100-2000 \ \mu W$   | 32kHz-1MHz                  | 1-week lifetime |
|                    |                      | DSP                         |                 |
| possible School    |                      |                             |                 |
| Neural recording   | 1-10 mW              | 11/a                        | Touctive .      |
| 20 21              |                      |                             | DWC             |
| Body-area monitor- | $140 \mu \mathrm{W}$ | <10MHz DSP                  | Battery         |
| E S                |                      | doption for all accurations |                 |
|                    |                      |                             |                 |

are energy-constrained since battery replacement requires surgical intervention. Implantable devices (pacemakers/defibrillators, cochlear implants, neural sensors/stimulators

Wearable devices (hearing aids, body-area sensors) have less stringent energy-constraints which Are set by battery weight limitations Courtesy: Verma & Chandrakasan

### Wireless Sensor Networks

as long as occasional degradation in performance quality, depending on the ambient factors, can be tolerated. automotive sensing, environment monitoring, structural applications for such devices include industrial and harvesting from the ambient environment can be leveraged of uses. To extend the lifetime of the sensor nodes, energy physically small to facilitate in-situ sensing in a broad range monitoring, and military surveillance/detection. Battery broadly reterred to as wireless sensor networks. The and communications capabilities can form networks, lifetime constraints are critical, and the battery must be Micro/nano-scale devices providing sensing, processing,

Ultra-low-power low-voltage MSP430 microcontroller



| Ц                                        |
|------------------------------------------|
| nergy                                    |
| Colle                                    |
| cting a                                  |
| and H                                    |
| larves                                   |
| sting                                    |
| Energy Collecting and Harvesting Options |
|                                          |

| Energy Source                        | Performance                       |
|--------------------------------------|-----------------------------------|
| Thermoelectric                       | 60 \(\int W \/ \cm^3 \)           |
|                                      | $100  \mu W/cm^2$ (office),       |
|                                      | $100 \ mW/cm^2 \ (direct light)$  |
|                                      | 4 \(\mu \W/\cm^3\) (human motion) |
| Heel strike                          | 10-700 mW (walking)               |
| Near-field inductive energy transfer | 20 mW at 5 cm [33]                |
| Far-field inductive energy transfer  | 2 //W at 10 m [34]                |

## Structure of Modern SRAM



Courtesy: Verma & Chandrakasan

### SRAM Leakage Energy

set by a single MOSFET pull-down stack with extreme variation. aggregation of intentionally minimum sized devices; and (3) Critical path SRAM leakage-energy increases due to three factors: (1) High ratio of leakage-paths to actively-switched-nodes; (2) Total leakage set by an



leakage-current, for a 1Mb array composed 0.25  $\mu m^2$  bit cells in an LP 45 nm technology. The simulated total aggregate leakage-current (at  $1.1 \mathrm{V}$ ), normalized to the nominal aggregate

Courtesy: Verma & Chandrakasan

## Circuit Delay Aggravation due to Performance Degradation



(a) Energy profiles represntative of generic logic (90nm 32b carry-lookahead adder).



(b) Relative leakage-energy shift exepcted in SRAMs due to increased ratio of leakage-currents to active-switching-current.



(c) Relative leakage-energy shift expected in SRAMs due to severe performace degradation from bit-cell variation.

The severe performance degradation due to the critical-path's dependence on a single bit-cell experiencing extreme variation, causes the leakage-energy curve to shift right-ward.

This can be understood by observing that the point at which the leakage-energy begins increasing exponentially occurs at a higher supply-voltage (0.8 V) than before (0.6 V). Effectively, the variation raises the limiting bit-cell's threshold voltage, and, as a result, supply-voltage reduction quickly leads to sub-threshold operation, which imposes an exponential increase in circuit delay.

Courtesy: Verma & Chandrakasan



switches [63] and (b) an operational-amplifier [64]. Figure 2-5: Circuitry to enforce idle-mode biasing using (a) programmable sleep



sitions. Figure 2-6: Waveforms corresponding to idle-to-active and active-to-idle mode tran-



- E<sub>ACT</sub>: switching energy for reads/writes
- E<sub>LKG</sub>: Leakage energy to meet read/write margin
- E<sub>IDL</sub>: Leakage energy to meet hold margin (i.e. at V<sub>DRV</sub>)
- E<sub>OH</sub>: Overhead energy to switch between active/ idle-mode biasing

Figure 2-7: Summary of SRAM energy components

$$E_{TOT} = E_{ACC} + E_{LKG} + E_{IDL} + E_{OH}$$
 (2.1)

active mode.  $E_{ACC}$  corresponds the switching energy required to perform reads and The active-access-energy  $(E_{AGC})$  and the leakage-access-energy  $(E_{LKG})$  pertain to the biasing in accordance with idle-mode power reduction. These components are sumenergy  $(E_{OH})$  corresponds to the overhead incurred due to altering the sub-array's mode, and it will also be referred to as the idle-mode energy. Finally, the overheadvoltage across the array that must be large enough to ensure reliable reads and writes writes, and  $E_{LKG}$  corresponds to the leakage-energy imposed by applying a supply-The idle-data-retention energy  $(E_{IDL})$  corresponds to data storage during the idle-

## **Breakdown of Energy Sources**

The total active-access-energy for reads of an i × j (i.e. i-column, j-row) sub-array is given by

$$E_{ACC,RD} = C_{WL}V_{DD}^2 + C_{cSEL}V_{DD}^2 + \frac{\imath}{m}C_{SA}V_{DD}^2 + \imath C_{BL}V_{DD}V_{SNS}$$

on all BLs to full logic levels in order to avoid data-disruption near VDD. During read accesses, for instance, the design in [68] actively amplifies the signal sense-amplifier input margin, Vsns, which can be less than 100mV. Nonetheless, in practice, active-access-energy consumption, however, is the bit-lines, BL, which are used to convey disruption caused by sustained pulling of the bit-cell storages nodes towards the BL voltage the BLs are often discharged beyond the sensing margin to reduce the probability of data-Strictly speaking, to resolve the read-data, the BLs need only discharge to the required the stored read-data to the sense amplifiers and to drive new write-data into the bit-cells interleaved array. In total, the number of sense-amplifiers is equal to the number of columns the one-hot enabled column-select, cSEL, for multiplexed column selection in a column-Full-swing signals typically include the one-hot enabled word-line, WL, for row selection, and in the sub-array divided by the column-multiplexing ratio, m. The most significant source of

The total active-access-energy for writes is approximately given by:

$$E_{ACC,WR} = C_{WL}V_{DD}^2 + C_{cSEL}V_{DD}^2 + \frac{i}{m}C_{BL}V_{DD}^2 + i\frac{m-1}{m}C_{BL}V_{DD}V_{SNS}$$

#### Tansistor Sizing



- inverters during reads
- Salue into cell ines must write new



Courtesy: Harris & Weste

### READ Operation:

- ☐ Precharge both bitlines wigh
- ☐ Then turn on wordline

₹**₽** 

N1 N3 -

- ☐ One of the two bitlines will
- be pulled down by the cell
- $\Box Ex: A = 0, A_b = 1$
- bit discharges, bit\_b stays high \*\*
- But A bumps up slightly
- □ Read stability
- A must not flip
- N1 >> N2



### WRITE Operation

- Drive one bitline high, other low
- ☐ Then turn on wordline
- Bitlines overpower cell
- $Ex: A = 0, A_b = 1, bit = 1, bit_b = 0$
- Force A\_b low, then A rises high
- Must overpower feedback
- P2 << N4 to force A\_b low,</li>
- N1 turns off, P1 turns on,
- raise A high as desired



## **SRAM Noise margins for Hold and Read Operations**



are "on" and bit-lines are clamped to VDD. 6T bit-cell butterfly curves showing bi-stable behavior during (a) hold, where access devices are "off", and during (b) read, where access devices

## Measurement of Noise Margin

- Measure method
- Increase VR and measure VL
- Increase VL and measure VR
- Make voltage transfer curve in VR and VL axes → Butterfly
- Measure  $I_{in} \rightarrow N$ -curve







**Butterfly Curve** 

0.9 0.9 0.7 0.6 0.5 0.2 0.1

**Courtesy: LT Microelectronics** 

**N-Curve** 

## Measurement of Iread, ILeakage and VDDHold

#### read

Measure bitline current when WL switches to high

#### LEAKAGE

Measure VDD (or VSS) current when WL=0

#### **VDD**HOLD

- Decreasing VDD voltage, while WL=0
- Measure minimum VDD voltage when |V(nl) V(nr)| ='sensing margin'



| <b>VDD</b> HOLD | eakage   | TO D             |        |                      |
|-----------------|----------|------------------|--------|----------------------|
| 110 mV          | 85.4 nA  | 2<br>2<br>2<br>8 | Cell   | Reference            |
| 78 mV           | 142.7 nA | 66.7 uA          | 25x12) | 32 nm (for 30x12 and |

Courtesy: LT Microelectronics



[(-150) -) 48 -1] (-150) 40. 1. Hot = I · ( + + 2 ) + x o 2 3 = 3 46 - 146 T Let ms : dofine: + temperature). swing divided by bomi at room Solowhold feeter (5ub-Hreshold Ty dut temperature A 5 2 d Just 165 = 0 and 125/2 I is open leavings current of a Li -1° 508 D Um 22 = . B = . A ( = ) ( = ] ( = ] ( = ] ( = ] ( = ] = ] [ = ] [ Assume 1 x cours of coursely 图: 在三年 (至三年 ) Assuming Zeno bounds of Ms & Ms bush to the sand the bounds. (1) -- +I = FI + EI : 10 WICHERSON DIONN STEEL TO THE STATE OF THE ST

The many descended comes thousand with the service in (a+1) 11+42 = 200 1/40 = Acres with the same of the sam 3- [(+ h) h) = ( - 1 ) dar - [ ( - 1 ) dar ) dar ( - 1 ) ( - 1 ) dar ( - 1 ) dar ( - 1 ) ( - 1 ) dar ( -  $\begin{array}{lll}
& = & \left(\frac{\lambda^{-100}}{4}\right) 4xa - 1 \cdot \left(\frac{\lambda^{-100}}{4}\right) 4xa = & \left(\frac$ 21 - 1 ( + 1 ) + 1 ( + 1 ) + 1 ) + 1 ( + 1 ) + 1 ( + 1 ) + 1 ( + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + 1 ) + m. moun minusion region show her is now m. DAV, the DIBL effects can be ignored. Lenders 2x42/2mot 2000 MAX2 wp In some · (3x-)-/4x2. 3'55/ -(1.42-1/-13,82 + 3,42-1/N) 3 + 0,54 = 3,71 = 3,71

mondo 6 mon Ph-butz is crest of months is mi restratization was repotented mass my my O = Mat, UAC = a = 12 - front - laponysof 20MD and a Noith and mo ( Ca+1) 446 = 2011, MGC , with such ismust for A In she in a resemble son JAM and return on Under the (=x AHa 20M) Lash; who, could Wingly Day Vac Similarion. 1 25 7 Mtt = 5.84 75 = an (5(1+1) u x98x6 = NSICT 2.1=n Sties of Standart min of swight of DRV: 18 cm = 36 mV × m 2 = 36 mV n=1, S=60 mV/decode

## Why DRV should be Minimized

factor in applications that are primarily in the standby mode. SRAM power is the dominant power consumption

technology generation. In CMOS technology, standby power consists of leakage-power which increases with each silicon-

within the scavenging power limit. For ultra low-power devices, standby leakage power reduction is crucial for device-operation

### Spatial Distribution of DRV for a 130 nm 32 Kb SRAM



between 50 mV and 240 mV **DRV** in SRAM cells varies Prof. Pinaki Mazumder

EECS 598-6, W2012

University of Michigan Length of transistors in SRAM cells. like Threshold Voltage and Channel DRV varies due to local parameters

### Empirical DRV distribution



Relative frequency

Leakage current of 256 cells

Another Example with 90 nm

200 mV more than the majority of about 300 mV which is about Standby Supply voltage will be add a 100 mV of guard band, the supply voltage of 200mV. If we the Histogram. the cell DRV values as shown in solution for data-retention is a experimental intra-chip DRV varies CMOS technology. The worst-case from 70 to 190mV in the 90nm Test-chip DRV-distribution: The

leakage-current is approximately piecewise linear In the range 100-400mV, the



Prof. Pinaki Mazumder

University of Michigan

### Technology Node v. DRV



simulation of within-die variation for 90nm and 45nm nodes. The tail sets the array-wide VStandby. DRV distribution from a 5k-point Monte-Carlo

# DRV varies slightly with temperature, but widely with process variation



Simulated worst bitcell SNM (a) and 1kb SRAM leakage power (b) vs. VDD under PVT variations (best-case, typical and worst-case) and 30 local mismatch.

## Techniques to Minimize DRV

### Fault-Tolerant Memory with ECC

### ECC Reduces the DRV

DRV Occurrences (Word)

B

8

Orginal DRV (mV)

\$

8

8

3



DRV Occurrences (Word)

ŝ

8

Optimized DRV (mV)

8

8

Table 1. Worst-case DRV range measured on 24 chips

DRV Occurrences (Word)

8

8

8

8

8

8

Optimized DRV with Error Correction (mV)

| DRV (mV) Original Optimized |    | zed Upumizea W/ |
|-----------------------------|----|-----------------|
| Min. 320 170                |    | *               |
| Max. 570 220                | 70 | 8               |

EECS 598-6, W2012 F

Prof. Pinaki Mazumder

University of Michigan

## Power Savings when SRAM is in Stand-by Mode



25% 2.8% 1.8% Normalized SRAM leakage

10%

75% reduction

SOX reduction 35% reduction

b) SRAM leakage power minimization

consumes only 1.8% of the original standby VDD to 255mV (D), and standby VDD down to 320mV (C), SRAM cell optimization brings the design by 75%. The DRV-aware consumption at 1V standard VDD memory leakage power (A). extra 35%. This final design (D) reduces the leakage power by an tolerant design further lowers the power reduction. The errorpower of an un-optimized SRAM 650mV (B) reduces the leakage Compared to the leakage power leading to another 90% leakage (A), lowering the standby VDD to

### 

Optimum and implemented schemes





The leakage power for the worst-case method, the [31.26,3] Hamming code based implementation, and the theoretical optimum are compared.

a close margin of 6-11%. the theoretical optimum are compared. The Power reduction for the [31,26,3] implementation tracks the optimum within Hamming code based implementation and