# Straintronics-Based Random Access Memory as Universal Data Storage Devices

Mahmood Barangi and Pinaki Mazumder, Fellow, IEEE

Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI 48109 USA

Nanomagnetic and spin-based memories are distinguished for their high data endurance in comparison with their charge-based peers. However, they have drawbacks, such as high write energy and poor scalability due to high write current. In this paper, we apply the straintronics principle that seeks the combination of piezoelectricity and inverse magnetostriction (Villari effect), to design a proof-of-principle 2 Kb nonvolatile magnetic memory in 65 nm CMOS technology. Our simulation results show read-access and write-cycle energies as low as 49 and 143 fJ/b, respectively. At a nominal supply level of 1 V, reading can be performed as fast as 562 MHz. Write error rates  $<10^{-7}$  and  $10^{-15}$  can be obtained at 10 and 5 MHz, respectively. In addition to nonvolatility, ultralow energy per operation, and high performance, our STRs memory has a high storage density with a cell size as small as 0.2  $\mu$ m<sup>2</sup>.

Index Terms—Active power, CMOS, energy barrier, leakage power, magnetic tunneling junction (MTJ), magnetization, magnetostriction, nonvolatility, piezoelectricity, strain, straintronics (STRs), stress, universal memory, Villari effect.

#### I. INTRODUCTION

NCREASING power density and static leakage currents due to aggressive scaling of CMOS technology pose major obstacles to power-aware very large scale integration chip designs. On-chip memory arrays occupy nearly 70% of the overall chip real estate in microprocessors, digital signal processors, and other applications, thereby becoming a major source of power hog in modern integrated circuits. While various types of semiconductor memories, such as static, pseudostatic, dynamic, resistive, phase change, and read only offer varying advantages in terms of speed, energy, and volatility, the search for a nonvolatile memory with high endurance and ultralow power consumption still continues as demands for memory and storage in cloud, high performance, as well as wearable computing are exponentially increasing. The Cinderella of memory technology must have a small footprint in cell size, very low energy consumption for both read and write operations, good ON/OFF switching ratio, and ability to enter into sleep or hibernation mode in the inactive state. Volatile CMOS memories, such as static random access memories (SRAMs) or dynamic RAMs (DRAMs), operate at fairly high speeds, but at the same time, dissipate a significant amount of power. Their high static power is mainly due to the fact that they need to be continuously connected to a power supply to retain their data. The SRAM presented in [1] takes an advantage of the subthreshold operation to reduce energy consumption. However, it still dissipates 2  $\mu$ W leakage power and 1.4 pJ/read-access/bit at 0.4 V supply voltage, and operates relatively slow at 475 KHz. Flash memories, as nonvolatile CMOS memories, demonstrate relatively high data density and read throughput, but they have low write and erase speeds,

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TMAG.2014.2374556



Fig. 1. Different memory types in terms of energy efficiency, speed, cell size, data endurance, and data retention. Dashed green lines: ideal regions.

as well as they require high voltages for programming and erasing. The work by Seo *et al.* [2] dissipate 7 mA at 66 MHz read throughput and has 20  $\mu$ s write delay.

Magnetic-based memories have been proposed as a potential replacement for charge-based memories due to their high data endurance and data retention, and their theoretically lower energy dissipation. The energy required for switching the state of a charge-based logic has a fundamental limit of  $NkT \ln p^{-1}$ , where k is the Boltzmann constant, N is the number of charge carriers, T is the operating temperature, and p is the bit error probability [3]. For magnetic-based logic, this limit is  $kT \ln p^{-1}$ , since magnetic domains align themselves to the adjacent domains' magnetization. Although magnetic logic has lower fundamental switching energy limits than charge-based logic, conventional magnetic memories usually demonstrate higher energy dissipation than their charge-based peers. This is mainly due to their use of current flow to perform read and write operations. The magnetic RAM (MRAM) demonstrated in [4] consumes 1.4 nJ energy for write operation, which is much higher than the energy limit discussed earlier.

Different memory types [5]–[9] in terms of energy per cell, speed of operation, cell size, data endurance, and data retention are shown in Fig. 1. The dashed regions in the diagrams demonstrate the ideal regions in which a memory can operate. The term universal memory identifies a memory that lies within the dashed regions of Fig. 1, implying a good energy efficiency, speed, data density, data endurance, and data

Manuscript received May 10, 2014; revised October 13, 2014; accepted November 10, 2014. Date of publication November 26, 2014; date of current version May 22, 2015. Corresponding author: P. Mazumder (e-mail: pinakimazum@gmail.com).

<sup>0018-9464 © 2014</sup> IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 2. Using piezoelectricity and Villari effect, high energies of FIMS and STT approaches are avoided. V: voltage. I: current. H: magnetic field. S: strain. M: magnetization.

retention. While SRAM lies in the ideal region of energy per cell and speed, it lacks the high density and data retention properties. DRAM shows acceptable data endurance and cell size, but it is not energy efficient and fails to demonstrate data retention due to volatility. Spin-transfer torque RAM (STT-RAM) fails to fulfill all five requirements due to the energy efficiency and write error rate obstacles. This is because high static currents are required for reliably switching the binary state of the magnetic cell. The STT-RAM in [10] requires >100  $\mu$ A to assure magnetic tunneling junction (MTJ) switching within 4 ns for <10<sup>-5</sup> error rate. Therefore, an approach that can switch the state of the magnetic cell without requiring high static currents can help in taking a step forward toward creating the universal memory.

Straintronics is defined as the combination of a magnetostrictive material with a piezoelectric layer lead-zirconatetitanate (PZT) and has been recently demonstrated to assist magnetization switching. An applied stress helps with the rotation of the magnetization vector in a nanomagnet (NM). Roy *et al.* [11] proposed flipping of the magnetization vector in a single magnet using this approach. Here, we demonstrate that the combination of a MTJ along with a piezoelectric material can significantly reduce the write energy in magnetic memories. Since high static current flows of STT and fieldinduced magnetization switching (FIMS) are avoided, dramatic energy reduction is achieved. Fig. 2 shows the alternate route we take through piezoelectricity and Villari effect to avoid static current flow. The write energy levels of FIMS, STT, and our straintronics approach are also shown in Fig. 2.

In this paper, we present a STR-RAM that demonstrates ultralow read and write energies, a high operating frequency, and a high data storage density. The remainder of this paper is organized as follows. Section II describes the principle of STRs. Section III includes our dynamic modeling of the STRs device and the choice of magnetostrictive materials for memory design. Section IV describes the memory bitcell design and the proposed read and write approaches. Section V demonstrates the memory architecture. Simulation results are highlighted in Section VI. Section VII concludes this paper.

## **II. PRINCIPLE OF STRAINTRONICS**

Fig. 3 shows the Straintronics device (STR) with its PZT-MTJ interface and its equivalent circuit model. The device is a cylindrical ellipse with its minor and major axes lying on yz plane, as shown in Fig. 3(a). In the absence of any external



Fig. 3. (a) STRs device with reference coordinates specified. (b) Equivalent electrical model of the device.

force, the magnetization vector aligns itself along the z-axis (major axis) since this alignment minimizes the magnetic energy of the device. An applied voltage across the PZT generates an electric field that leads to a strain, S, which appears as a change of length, L, where  $S = \Delta L/L$ . This physical length change of the PZT layer transfers a mechanical energy to the free magnet. Depending on the polarity of the applied voltage, the Villari effect can create an energy minimum along the y-axis (minor axis), allowing the magnetization to rotate freely toward this axis. We will now explain these steps in detail.

#### A. E-Field Generation

Given the equivalent *RC* model of the device in Fig. 3(b), a voltage applied across the device generates an electric field, E = V/d, where V is the supply voltage, and d is the thickness of the PZT. MTJ can be modeled as a variable resistance, and PZT can be modeled as a parallel plate capacitance. The MTJs resistance is defined as [12]

$$R_{\rm MTJ} = \left\{ R_m + \frac{1}{2} \left( R_M - R_m \right) \times \left( 1 - \cos \theta \right) \right\}$$
(1)

where  $R_M$  is the high resistance state, in which free and pinned layers have antiparallel (AP) magnetization orientation,  $R_m$ is the low resistance state, in which they have parallel (P) orientation, and  $\theta$  is the angle of the magnetization vector of the free layer with respect to the major axis.

#### B. Strain Generation Due to Piezoelectricity

The relationship between the E-field and its resulting strain is demonstrated by the modified Hooke's law for piezoelectricity

$$\{S\} = s\{\sigma\} + d^{t}\{E\}$$
<sup>(2)</sup>

where s is the compliance matrix,  $\sigma$  is the stress, and d is the 3 × 3 tensor describing the piezoelectric effect. We use PZT as the piezoelectric layer, in which the  $d_{31}$  coefficient converts the electric field along the x-axis to a strain in the yz plane.

The PZT is chosen to be four times thicker than the free NM, while keeping a large plane interface between the two layers. This assures that the strain can almost completely transfer to the NM.



Fig. 4. Energy barrier decreases and eventually vanishes as mechanical stress on the magnet increases.

### C. Stress Anisotropy in the NM Due to Magnetostriction

In the absence of stress, the magnetic moment tends to align itself along the major axis, called intrinsic easy axis. This is due to shape anisotropy and uniaxial crystalline anisotropy. Shape anisotropy energy density,  $E_{sh}$ , is defined as

$$E_{\rm sh} = \frac{\mu_0}{2} M_s^2 N_{\rm sh} \tag{3}$$

where  $\mu_0$  is the permeability of vacuum,  $M_s$  is the saturation magnetization of the NM, and  $N_{\rm sh}$  is the demagnetization factor. The uniaxial crystalline anisotropy has the energy density,  $E_u$ , defined as

$$E_u = K_u \sin^2 \theta \tag{4}$$

where  $K_u$  is the uniaxial anisotropy coefficient.

The two energies mentioned above create an energy barrier between major and minor axes, where minimum energy occurs along the major axis ( $\theta = 0, \pi$ ), whereas a maximum energy occurs along the minor axis ( $\theta = \pi/2$ ).

When a stress is applied to the magnetostrictive material, the stress anisotropy energy density,  $E_{\sigma}$ , due to the Villari effect, is given by

$$E_{\sigma} = \frac{3}{2} \lambda_s \sigma \, \sin^2 \theta_{\sigma} \tag{5}$$

where  $\lambda_s$  is the magnetostriction expansion at saturation, and  $\theta_{\sigma}$  is the angle between the magnetization vector and the minor axis. As mentioned previously, when  $\sigma = 0$ , the magnetization vector tends to retain its orientation along the major axis (P orientation state or AP orientation state) due to the energy barrier. As we apply stress, the energy barrier reduces, as shown in Fig. 4. At some stress value, called critical stress, the energy barrier vanishes. For cobalt as the NM with our selected device geometries, this value is  $\sigma_{\text{critical}} = 54.5$  MPa. Any stress higher than the critical stress forces the magnetization vector to rotate and then align itself along the minor axis. If the duration of the applied stress is within successful pulsewidth (analyzed in detail in Section IV), the magnetization vector will continue to rotate and settle at the opposite orientation of the starting state. This is the principle of flipping of the magnetization vector due to STRs.



Fig. 5. Magnetoresistance value when a 0.2 V pulse is applied at t = 5 ns and removed abruptly at t = 15 ns.

## III. DYNAMIC MODELING OF THE STRAINTRONICS DEVICE

To build a memory, we first developed a 3-D model of the STR. The model accurately follows the dynamic behavior of the device based on the Landau–Lifshitz–Gilbert equation in Gilbert form [13]

$$\frac{dM}{dt} = -\frac{\gamma_0}{(1+\alpha^2)} (\vec{M} \times \vec{H}) - \frac{\gamma_0}{Ms \times (\alpha + \frac{1}{\alpha})} \vec{M} \times (\vec{M} \times \vec{H})$$
(6)

where  $\alpha$  is the Gilbert damping factor,  $\gamma_0$  is the gyromagnetic ratio,  $\vec{M}$  is the magnetization vector, and  $\vec{H}$  is the net effective magnetic field. The net effective magnetic field is mainly due to shape anisotropy, uniaxial anisotropy, and stress anisotropy. By expressing the net effective magnetic field in terms of the  $(r, \theta, \varphi)$  components and performing vector and algebraic operations, (6) can be turned into the following coupled equations for  $\theta$  and  $\varphi$  angles of the magnetization vector:

$$\frac{d\theta}{dt} = \frac{\gamma_0}{1+\alpha^2} (H_{\varphi} + \alpha H_{\theta}) \tag{7}$$

$$\frac{d\varphi}{dt} = \frac{\gamma_0}{1+\alpha^2} \frac{1}{\sin\theta} (\alpha H_{\varphi} - H_{\theta}) \tag{8}$$

where  $H_{\theta}$  and  $H_{\varphi}$  can be obtained from the shape anisotropy, uniaxial anisotropy, and stress anisotropy, and are given as follows:

$$H_{\theta} = -\frac{1}{\mu_0 V M_S} \left( \frac{\mu_0}{2} M_S^2 V (N_x - N_y) + \frac{3}{2} \lambda_s \sigma V \right) \sin \theta \sin 2\varphi$$
(9)

$$H_{\varphi} = -\frac{1}{\mu_0 V M_S} \left\{ \frac{\mu_0}{2} M_S^2 V \left( N_y \sin^2 \varphi + N_x \cos^2 \varphi - N_z \right) -\frac{3}{2} \lambda_s \sigma V \sin^2 \varphi + K_u V \sin 2\theta \right\}$$
(10)

where  $N_x$ ,  $N_y$ , and  $N_z$  are the shape anisotropy coefficients along the Cartesian axis. Equations (7) and (8) are used to obtain the instantaneous angles ( $\theta$ ,  $\varphi$ ) of the magnetization vector at any time with any given voltage across the STR. With the instantaneous value of  $\theta$ , the MTJ resistance (also called magnetoresistance) in our electrical model can be calculated using (1). Fig. 5 shows the dynamic waveform of magnetoresistance value of cobalt, as we apply a 200 mV pulse at t = 5 ns. Before the pulse is applied, the magnetization vector

Terf.-D Nickel Cobalt Metglas Gilbert damping factor 0.10.045 0.01 0.2 Magnetostriction expansion at 60 2 2 1.2 saturation Critical flipping voltage 12mV 16mV 65mV 165mV Alignment delay 240ps 435ps 286ps 2.89ns Relaxation delay 2.64ns 2.76ns 2.06ns 4.66ns

TABLE I

MATERIAL PROPERTIES AND CRITICAL VOLTAGE, CRITICAL RELAXATION DELAY



Fig. 6. Alignment delay versus applied voltage for cobalt and metglas.

is relaxed along the major axis parallel to the fixed layer's magnetization orientation; as a result, magnetoresistance is low. When a voltage higher than the critical voltage (associated with the critical stress) is applied, the magnetization vector aligns along the minor axis, and the resistance value settles at the mid value between high and low states. When the pulse is removed abruptly at t = 15 ns, the magnetization vector will settle to either +z-axis or -z-axis, due to the energy barrier, leading to a low or high resistance value.

We simulated four different NMs for their critical flipping voltage (associated with critical stress) and their dynamic delays. Thermal fluctuations are included in the model to have more realistic results and assist with the magnetization switching. The results are demonstrated in Table I. Alignment delay is measured from the time that a pulse voltage with 0.5 V amplitude is applied until the magnetization vector moves toward the minor axis. The relaxation delay is measured when the pulse is removed until  $\theta$  settles within  $\pi/10$  of the major axis. Terfenol-D and nickel flip at very low voltages and, therefore, will be vulnerable to CMOS parasitic noises and leakages. However, cobalt and metglas provide reasonable noise margins to stay immune against CMOS parasitic noises. Therefore, we further simulated these two materials for flipping delay as a function of voltage across the device. Fig. 6 shows the results, where we applied pulses and let the magnetization vector align itself along the minor axis. At 1 V applied voltage, cobalt aligns five times faster than metglas. Furthermore, after removing the stress, cobalt relaxes to the major axis within 2.06 ns, whereas metglas takes 4.66 ns to relax. As a result of the above discussion, due to its noise



Fig. 7. (a) Proposed bitcell architecture. (b) Topology of reference cell and connection of RBL and reference line to SA.

immunity and fast response, cobalt is the primary choice for our memory bitcell design.

#### **IV. MEMORY BITCELL DESIGN**

Fig. 7(a) shows the proposed bitcell architecture of the STR-RAM. The read port of the STR cell on the right side is connected to the free layer of the MTJ already shown in Fig. 3(a). An NMOS is used to access read bit line (RBL) as the RBLs voltage level is low. A transmission gate is used to access write word line (WWL) since high and low voltages are applied to the cell through this line. Read operation is performed by sending a current through RBL and comparing the resulting voltage to the reference voltage  $(V_{ref})$ , using a sense amplifier (SA). The reference cell, shown in Fig. 7(b), is made with MTJs that are pinned at high/low states leading to a reference resistance of  $R_{\rm ref} = R_H + R_L/2$ . A dummy capacitance is used to relax the clock feedthrough from SA. The current through RBL is generated using voltage-controlled current sources (VCCSs) and kept limited to a few microamperes. This leads to higher energy efficiency and avoids the STT effect. SA has a dynamic latched topology [14] to avoid static power dissipations. Differential pair transistors in SA are oversized to alleviate offset. At 1 V supply level, SA has a delay of 106 ps and an energy per operation of 24 fJ. This assures that SA will neither be a speed nor an energy blockage for the entire system.

Write operation is not as straightforward as the read access. Switching of a single magnet under an applied stress is theoretically analyzed in [15]. Here, we will discuss the switching of the STR and the effect of circuit variation on it to find a solution for writing into the straintronics memory cell. If the applied pulse through WWL is maintained on the STR for a long time, the magnetization vector of the device will settle along the minor axis. When the stress is removed, the magnetization vector can either align to P or AP state, as shown in Fig. 5. Therefore, retaining voltage on STR for a long time will push the cell into a metastable state, where reaching the desired state upon removing the pulse is not promised. However, dynamics in (6) guarantee certain pulse duration, called the successful pulsewidth, which assures flipping to the opposite state. Our simulations on cobalt with 75 mV applied pulse are shown in Fig. 8(a). A pulsewidth between 1.7 and 2.7 ns can assure flipping from  $P \rightarrow AP$ . Shorter or longer pulses can



Fig. 8. (a) Successful pulsewidth required for flipping the magnetization vector from  $\theta = 0$  to  $\theta = \pi$  for cobalt with 75 mV pulse amplitude. (b) As the pulse amplitude increases, the success margin decreases due to lower general damping factor. (c) Success margin demonstrates gaps at higher voltages due to lower general damping factor.

cause failure. As we increase the applied voltage across the device, two phenomena are observed as follows.

1) The success margin, as shown in Fig. 8(a), narrows. This is mainly due to the fact that the effective damping factor of the magnetization reduces as we increase the voltage across the device. By simplifying (7) and (8) for  $\theta$  and using Taylor series approximations, the effective damping factor,  $\zeta$ , can be given as

$$\zeta = \frac{\alpha (M_1 + M_2)}{\sqrt{4(1 + \alpha^2)M_1M_2 - \alpha^2(M_1 + M_2)^2}}$$
(11)

where  $M_1$  and  $M_2$  are coefficients that depend on material properties and the applied voltage across the device. The success margin of cobalt and metglas as a function of applied pulse amplitude across the device is shown in Fig. 8(b). Pulse amplitudes >0.3 V are not demonstrated in the plot, as they lead to many failure gaps, which will be demonstrated in Fig. 8(b). Metglas shows a lower margin due to its higher value of  $\zeta$ , resulted from the high Gilbert damping factor.

2) A higher voltage and, therefore, a lower effective damping factor lead to failure gaps in the success margin. This is shown in Fig. 8(c) for cobalt when a 200 mV pulse is applied across the device. As a result, the higher voltages lead to uncertain success margins. However, they provide a much faster alignment of the magnetization vector along the minor axis.

The above discussion and simulations on successful pulsewidths solely consider the dynamics of the STR device on its own. In practical systems, however, due to circuit variations, success is not always guaranteed. To show this, we simulated P to AP switching in a memory bitcell for different pulsewidths. The results are shown in Fig. 9. In the best case, pulsewidths between 1.9 and 2.8 ns have  $\sim 65\%$  success. Therefore, a read operation should always be performed after a write attempt to check for flipping success.

As a result of above discussion, two write approaches are possible as follows.

 Apply 75 mV pulse for 2.2 ns, then let the magnetization vector relax and read. This approach has ~65% flipping success as discussed earlier.



Fig. 9. Successful flipping for a different pulsewidth for a memory cell.

2) Apply 1 V pulse for 200 ps and go to the metastable point (where the magnetization vector settles along the minor axis), then let magnetization relax and read. This approach has a 50% flipping success.

Approach 1) takes almost 6 ns, whereas approach 2) takes almost 4 ns. While, in the long run, the two approaches provide almost the same write error probability (WEP)  $(0.35^{t_{write}/6ns} \approx 0.5^{t_{write}/4ns})$ , approach 2) leads to a simpler design. Therefore, we adopted this approach. An attempt to write is called a write cycle. Multiple write cycles might be required to achieve successful writing. This establishes a tradeoff between the total write time (i.e., the number of write cycles) and the WEP, which is analyzed later in Section VI.

Fig. 10 shows the write operation for logic 1 and 0. Upon receiving the command to write logic 1, the memory performs a read to see if the bitcell data is different from the write data. Since it is the case, memory performs a write attempt, which is successful, and therefore, no more write cycles occur. Writing logic 0 follows the same algorithm, however, this time the first write cycle fails to write the data, and therefore, memory performs a second write attempt, which successfully writes the data into the cell.



Fig. 10. Dynamic waveforms for write operation of logic 1 and 0.



Fig. 11. 2 Kb STR-RAM architecture.

#### V. MEMORY ARCHITECTURE

A 2 Kb memory is designed using the STRs cells combined with the CMOS devices. The memory consists of 128 rows and 16 columns. Read and write operations are performed on 16 bit columns simultaneously. Fig. 11 shows the topology of the memory. The controller uses a ring oscillator to generate the required signals, which automatically clock-gates itself when the read or write commands are performed. When reading from a cell, read word line is activated, and the MTJs state is detected using the VCCSs and the reference cell. When writing, WWL is activated through the decoder. When not writing, the WBL is kept connected to ground to make sure that the top plates of the STRs device will not reach the critical voltage due to leakage.

#### VI. SIMULATION RESULTS

The memory is designed and simulated in 65 nm CMOS process with a 1 V supply voltage. The PZT has a piezoelectric coefficient of  $d_{31} = 1.8 \times 10^{-10}$  m/V, and a dielectric constant of 1700. The axes of the device are chosen to be 205 and 195 nm. This provides an energy barrier of 125 kT, which promises a storage



Fig. 12. Read-access and write-cycle energies per bit versus  $V_{DD}$ .



Fig. 13. Read-access and write-cycle delays versus V<sub>DD</sub>.

class memory [16]. Given the cell architecture in Fig. 6(a), the cell size is limited to the CMOS devices and can be as small as  $0.2 \ \mu m^2$  as MTJ can be placed on top of the access transistors [17].

Fig. 12 shows the energy/read access/bit and the energy/write cycle/bit as a function of the power supply level. Multiple write cycles might be required to achieve successful flipping, as discussed earlier. The plots show their minimums at  $V_{\rm DD} = 0.55$  V. Values below this supply voltage, lead to high leakage energy dissipations due to large delays, and, therefore, are not energy efficient. The energy values reported here include the entire memory and are mostly due to the CMOS controllers. The STRs device, on its own, dissipated only a small portion (<10% for write operation and <2% for read access) of these energy values. Read and write delays significantly increase with the reduction of  $V_{\rm DD}$ , as shown in Fig. 13, mainly due to the slower ring oscillator in the controller block.

Since the probability of the successful flipping is 1/2 for a single write cycle, the WEP will decrease as we increase the total write time, which is shown in Fig. 14. Therefore, write frequency can be adjusted based on the error tolerance of the system. A WEP  $<10^{-6}$  can be achieved with an 80 ns write time. This is still much faster than flash memories, which demonstrate write delays in orders of few microseconds [2].

 TABLE II

 Comparison of STR-RAM With Different Memories in Literature

|                   | Туре   | Volatility | Tech<br>(nm) | VDD (V) | Cell Area $(um^2)$ | <i>E<sub>read</sub></i> /bit (pJ) | Freq.<br>(Hz) <sup>‡</sup> |
|-------------------|--------|------------|--------------|---------|--------------------|-----------------------------------|----------------------------|
| [1]               | SRAM   | V          | 65           | 0.4     | *                  | 0.011                             | 475K                       |
| [2]               | Flash  | NV         | 130          | 0.9     | 0.276              | 2.38                              | 50M                        |
| [4]               | MRAM   | NV         | 90           | 1       | 1.25               | 28.1                              | 66M                        |
| [8]               | DRAM   | V          | 65           | 1       | 0.115              |                                   | 500M                       |
| [10] <sup>†</sup> | STTRAM | NV         | 90           | 1       |                    |                                   | 250M⊥                      |
| This work         | STRRAM | NV         | 65           | 1       | 0.2**              | 0.049                             | 562M                       |

\*A 6T SRAM cell for this technology typically takes 0.71 um<sup>2</sup>

\*\*Approximation, since MTJ lands on top of CMOS

Read frequency; Write-time for Flash in [2] is 20  $\mu$  s and for STR-RAM and STT-RAM is variable depending on the system tolerance on WEP.

 $\perp$  The speed can be adjusted with varying current and error tolerance. The values are for 4 ns delay with less than 10<sup>-6</sup> error probability.

† Only analyzes the switching current and delay for the MTJ on its own and does not include the energy of the entire memory system



Fig. 14. WEP versus write time.



Fig. 15. Read performance of STR-RAM compared with SRAM in [1].

Fig. 15 demonstrates the STR-RAM's performance as a function of the supply voltage. Even when operating in near threshold, the memory can read as fast as 10 MHz.

We tabulated our results in comparison with the state-ofthe-art present memory types in Table II. Various memories are designed for different applications. The SRAM in Table II shows a very low energy due to its subthreshold operation. However, it operates at very low frequencies and cannot be used for high-speed energy-limited applications. The flash memory has a moderate energy, but it should be noted that it has much lower data endurance than magnetic memories and has a large write time. The MRAM shows high energies and a large cell area. The DRAM, along with SRAM suffers from its lack of data retention in the absence of the supply voltage. The STT-RAM has a moderate energy level that can be further improved using straintronics. STR-RAM shows promises to become the veritable candidate for the future universal memory.

#### VII. CONCLUSION

In this paper, we have shown through device modeling and circuit simulation how nonvolatile STR-RAM memories can be built through an example design of a 2 Kb memory block. The simulation results confirm that STR-RAM can operate at 562 MHz (read frequency), whereas dissipating only 49 fJ/read-access/bit and 143 fJ/write-cycle/bit. The nonvolatility property of STR-RAM can be leveraged to turn ON the memory only when an access is warranted, thereby reducing the data retention voltage to 0 V in the sleep or hibernation mode. In traditional SRAM memory blocks used in data and instruction caches, the supply voltage is significantly reduced in the hold state to minimize the power dissipation. However, due to wide process variations in sub-50 nm CMOS technology, the theoretical limits of data retention voltage can be rarely achieved and the actual value of data retention voltage is set between 200 and 300 mV by allowing a safety guard band on the top of the worst-case memory access time for cells in the array. Furthermore, the STR-RAM has much smaller cell size (0.2  $\mu$ m<sup>2</sup>) than its counterpart SRAM cells that may vary being between 6 and 10 T, depending on the supply voltage of operation. Finally, further research is necessary to fabricate STR-RAM chips and characterizes their reliability as well as the performance metrics before straintronics can be crowned as the future universal memory technology.

#### ACKNOWLEDGMENT

This work was supported in part by the NSF NEB Grant ECCS-1124714 (PT106594-SC103006) and in part by the AFOSR Grant FA9550-12-1-0402.

#### References

- B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE J. Solid-State Circuits*, vol. 42, no. 3, pp. 680–688, Mar. 2007.
- [2] M. K. Seo et al., "A 0.9 V 66 MHz access, 0.13 μm 8 M(256K×32) local SONOS embedded flash EEPROM," in Symp. VLSI Circuits, Dig. Tech. Paper, Jun. 2004, pp. 68–71.
- [3] S. Salahuddin and S. Datta, "Interacting systems for self-correcting low power switching," *Appl. Phys. Lett.*, vol. 90, no. 9, pp. 093503-1–093503-3, Feb. 2007.
- [4] R. Nebashi et al., "A 90 nm 12 ns 32 Mb 2T1MTJ MRAM," in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Paper (ISSCC), Feb. 2009, pp. 462–463.
- [5] D. Cai *et al.*, "An 8-Mb phase-change random access memory chip based on a resistor-on-via-stacked-plug storage cell," *IEEE Electron Device Lett.*, vol. 33, no. 9, pp. 1270–1272, Sep. 2012.
- [6] K. Tsuchida et al., "A 64 Mb MRAM with clamped-reference and adequate-reference schemes," in *IEEE Int. Solid-State Circuits Conf. Dig. Tech. Paper (ISSCC)*, Feb. 2010, pp. 258–259.
- [7] J. DeBrosse *et al.*, "A high-speed 128-kb MRAM core for future universal memory applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 4, pp. 678–683, Apr. 2004.
- [8] S. Romanovsky et al., "A 500 MHz random-access embedded 1 Mb DRAM macro in bulk CMOS," in *IEEE Int. Solid-State Circuits Conf.* Dig. Tech. Paper. (ISSCC), Feb. 2008, pp. 270–612.
- [9] J. Shah, M. Barangi, and P. Mazumder, "Memristor crossbar memory for hybrid ultra low power hearing aid speech processor," in *Proc. 13th IEEE Conf. Nanotechnol. (IEEE-NANO)*, Aug. 2013, pp. 83–86.
- [10] L. Thomas *et al.*, "Perpendicular spin transfer torque magnetic random access memories with high spin torque efficiency and thermal stability for embedded applications," *J. Appl. Phys.*, vol. 115, no. 17, pp. 172615-1–172615-6, May 2014.
- [11] K. Roy, S. Bandyopadhyay, and J. Atulasimha, "Hybrid spintronics and straintronics: A magnetic technology for ultra low energy computing and signal processing," *Appl. Phys. Lett.*, vol. 99, p. 063108, Jan. 2011.
- [12] L. Engelbrecht, "Modeling spintronics devices in verilog—A for use with industry standard simulation tools," Ph.D. dissertation, School Elect. Eng. Comput. Sci., Oregon State Univ., Corvallis, OR, USA, Mar. 2011.
- [13] F. G. Sánchez, "Modeling of field and thermal magnetization reversal in nanostructured magnetic materials," Ph.D. dissertation, Dept. Condens. Matter, Auto. Univ. Madrid, Madrid, Spain, Nov. 2007.
- [14] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, "A currentcontrolled latch sense amplifier and a static power-saving input buffer for low-power architecture," *IEEE J. Solid-State Circuits*, vol. 28, no. 4, pp. 523–527, Apr. 1993.
- [15] K. Roy, S. Bandyopadhyay, and J. Atulasimha, "Energy dissipation and switching delay in stress-induced switching of multiferroic devices in the presence of thermal fluctuations," *J. Appl. Phys.*, vol. 112, no. 2, p. 023914, 2012.

- [16] A. Driskill-Smith *et al.*, "Non-volatile spin-transfer torque RAM (STT-RAM): Data, analysis and design requirements for thermal stability," in *Proc. Symp. VLSI Technol. (VLSIT)*, Jun. 2010, pp. 51–52.
- [17] S. Matsunaga *et al.*, "Fabrication of a nonvolatile full adder based on logic-in-memory architecture using magnetic tunnel junctions," *Appl. Phys. Exp.*, vol. 1, no. 9, p. 091301, Aug. 2008.

**Mahmood Barangi** received the B.S. degree in electrical engineering from Sharif University of Technology, Tehran, Iran, in 2009, and the M.S. degree in electrical engineering from the University of Michigan (UM), Ann Arbor, MI, USA, in 2011, where he is currently pursuing the Ph.D. degree.

He is currently a Graduate Student Research Assistant with the Department of Electrical Engineering and Computer Science, UM. His current research interests include low-power digital and mixed-signal circuit design, SRAM memory design, and spin transfer torque-based logic, and memory design.

**Pinaki Mazumder** (S'84–M'87–SM'95–F'99) received the Ph.D. degree from the University of Illinois at Urbana-Champaign, Champaign, IL, USA, in 1988.

He was with industrial research and development centers, including AT&T Bell Laboratories, Murray Hill, NJ, USA, where in 1985, he started the CONES Project-the first C modeling-based very large scale integration (VLSI) synthesis tool at India's premier electronics company, Bharat Electronics, Ltd., Bangalore, India, where he had developed several highspeed and high-voltage analog integrated circuits intended for consumer electronics products. He is currently a Professor with the Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA, where he is on leave for one year to serve as the lead Program Director of the Emerging Models and Technologies Program with the U.S. National Science Foundation, Arlington, VA, USA. He has authored or co-authored over 200 technical papers and four books on various aspects of VLSI research works. His current research interests include current problems in nanoscale CMOS VLSI design, computer-aided design tools, and circuit designs for emerging technologies, including quantum MOS and resonant tunneling devices, semiconductor memory systems, and physical synthesis of VLSI chips.

Dr. Mazumder is a Fellow of the American Association for the Advancement of Science. He was a recipient of the Digital's Incentives for Excellence Award, the BF Goodrich National Collegiate Invention Award, and the Defense Advanced Research Projects Agency Research Excellence Award.