thermal analysis of 8-t sram for nano-scaled

Thermal Analysis of 8-T SRAM for Nano-Scaled Technologies

Mesut Meterelliyoz, Jaydeep P. Kulkarni and Kaushik Roy School of Electrical and Computer Engineering

Purdue University, West Lafayette, IN 47906, USA

<mesut, jaydeep, kaushik> @ ecn.purdue.edu

ABSTRACT Different sections of a cache memory may experience different temperature profiles depending on their proximity to other active logic units such as the execution unit. In this paper, we perform thermal analysis of cache memories under the influence of hot-spots. In particular, 8-T SRAM bitcell is chosen because of its robust functionality at nano-scaled technologies. Thermal map of entire 8-T SRAM cache is generated using hierarchical compact thermal models while solving the leakage and temperature self consistently. The impact of spatial temperature variations on 8T-SRAM parameters such as local bitline (LBL) sensing delay, noise robustness and bitcell stability are evaluated for 45nm/32nm/22nm bulk CMOS technology nodes. The effectiveness of variable keeper sizing on LBL sensing delay is analyzed. It is predicted that at 22nm node, the leakage induced temperature rise has severe effects on the 8-T SRAM characteristics.

Categories and Subject Descriptors B.3.1 [Memory Structures]: Semiconductor Memories – Static Memory (SRAM) B.3.2 [Memory Structures]: Design Styles – Cache Memories B.8.0 [Performance and Reliability]: General

General Terms Performance, Design, Reliability

Keywords 8T-SRAM, Compact thermal models, Leakage, Noise robustness, Thermal analysis, Variable keeper

1. INTRODUCTION Aggressive scaling of transistor dimensions with each technology generation has resulted in increased integration density and improved device performance. This has resulted in very fast and power hungry computation modules such as floating point and arithmetic logic units. High speed computation units increase the chip temperature by generating local hot-spots. Increased leakage can also cause self-heating especially due to the strong temperature dependence of sub-threshold leakage current. Thermal non-uniformity induced by high speed computation units

and self-heating can affect the adjacent low activity circuit blocks. Embedded cache memories which are traditionally considered as ‘cold’ sections of the chip might also be influenced by hot-spots. Moreover, it is not practical to place cache memories away from computation units due to increase in the latency. Further, as the area occupied by the cache memory is expected to grow (90% of the total die area [1]), temperature variations within the cache might become critical for nano-scaled technologies.

Cache memory, which is traditionally built using 6-T SRAM bitcells, employs minimum geometry transistors in a given technology node. Because of minimum sized transistors, process variations such as random dopant fluctuations and line edge roughness have become major concern in the design of robust SRAMs [2]. These variations result in asymmetric voltage transfer characteristics which degrades the bitcell stability. Read/Write/Hold/Access time failure are the main failure mechanisms in a 6-T SRAM bitcell [2]. These failures are aggravated as the supply voltage is reduced. Hence, increased process variations limit the supply voltage scaling in 6-T SRAM bitcell which is essential for low power operation.

In order to improve the read stability of the 6-T bitcell operating at lower supply voltages, 8 Transistor (8-T) SRAM bitcell has been proposed [3-5]. As shown in Fig. 1, 8-T SRAM bitcell is composed of standard 6-T SRAM with two additional transistors for read operation (RD and RDAX in Fig. 1). The fundamental stability problem in a 6-T bitcell is that, during Read, the access transistor (AXL) pulls up the internal node storing ‘0’ to a positive voltage (Vread). If Vread is higher than the switching threshold of the other inverter, the cell will flip resulting in a read failure event. Conversely, in an 8-T SRAM bitcell, additional read transistors form a separate read port using the configuration shown in Fig. 1. Since bitcell storage nodes are not disturbed during the read operation, 8-T SRAM has improved bitcell stability. Moreover, separate word-lines for read and write operations allow simultaneous read/write access which increases

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ISLPED’08, August 11–13, 2008, Bangalore, India. Copyright 2008 ACM 978-1-60558-109-5/08/08...$5.00.

Figure 1. 8-T SRAM bitcell schematics

123

the performance in 8-T SRAM. In addition, 6-T portion of the 8-T bitcell can be optimized separately for reliable write operation.

On the other hand, 8-T SRAM bitcell might be more prone to temperature variations due to its dynamic sensing nature. In this paper, we analyze 8-T SRAM cache memory under spatial temperature variations. The main contributions of the paper can be listed as follows:

• Generate the compact thermal model for the 8-T SRAM for 45/32/22nm technology nodes using thin-cell layout

• Develop the thermal maps for 45nm/32nm/22nm node 8-T cache using hierarchical thermal models

• Analyze the impact of temperature variation on dynamic sensing parameters, such as bitline sensing delay, noise robustness and keeper sizing under technology scaling

• Evaluate the 8-T bitcell stability in terms of SNM for different technology nodes

The rest of the paper is organized as follows. In Section 2, 8-T bitcell architecture is summarized. Section 3 explains thermal map generation using hierarchical thermal modeling methodology. Result and discussions are presented in Section 4. Finally, conclusions are drawn in Section 5.

2. 8T BITCELL ARCHITECTURE The data sensing in an 8-T SRAM is single ended with 8/16/32/64 bitcells connected to a local bit line (LBL) as a dynamic logic gate (Fig. 2). When the RWL is turned ON, depending on the data stored in the cross coupled inverter pair, the LBL node is evaluated. Output of the local bit line sensing is applied to another stage of dynamic gate consisting of Global Bit Line (GBL). Bitline sensing is performed hierarchically using dynamic logic gates. A weak PMOS transistor called ‘keeper’ tries to pull up the dynamic node voltage (VLBL) to VDD. It supplies the leakage currents of parallel NMOS devices in the read port. In addition, the keeper device tries to avoid the collapsing of the dynamic node against spurious noise on the inputs of the read port transistors. Thus a relationship exists between the LBL sensing delay and the noise robustness. A stronger keeper pulls up the dynamic node at VDD giving higher noise immunity at the cost of increased LBL delay. Hence choosing the optimum keeper size is essential for robust as well as high performance operation.

The impact of temperature on LBL sensing circuit is two folds: (i) The sub-threshold leakage current of the parallel NMOS read port

transistors increase exponentially. (ii) The ON current of the keeper transistor reduces due to mobility degradation. This would affect the noise immunity of the LBL sensing circuit. In addition, with temperature increase, the LBL sensing delay might change significantly due to reduced read current. Thus, temperature variations inside the 8-T SRAM cache should be considered during the design phase. The following section will explain the development of thermal maps for the entire cache memory.

3. THERMAL MAP GENERATION FOR CACHE Thermal maps for the entire cache are created using a hierarchical compact thermal model generation methodology which is summarized in Fig. 3. As shown in Fig. 3, hierarchical thermal modeling starts with the bitcell level compact thermal models. Cell level thermal models are generated using Fourier theory of heat transfer [6]. These models are used in generating first level thermal maps for a given memory floor-plan, power dissipation profile and thermal boundary conditions [6] [7]. Using the generated thermal maps and the given floor-plan, first level macro thermal models are created. This procedure is repeated hierarchically to develop higher level thermal compact models and generate the full cache level thermal model and temperature maps in a time-efficient way.

3.1 Cell Level Compact Thermal Model The schematic of an 8-T SRAM cell is shown in Fig. 1. The 3-D layout of the same cell (at 45nm node) which is generated using the layout rules given in [8] is shown in Fig. 4. In this figure, Lmin is the minimum length of transistors (45nm at 45nm node) and

Figure 2. Local Bitline Sensing in 8-T SRAM

Figure 3. Hierarchical thermal modeling methodology

Figure 4. SRAM “thin-cell” structure at 45nm node

124

Wmin is 2xLmin. The layout consists of 150nm SiO2 layer on top of a 150nm silicon region. Transistor structures are placed inside the SiO2 layer. The gate material is assumed to be poly-Si and the metallic contacts are assumed to be aluminum. These structures are enclosed in a cuboidal domain of size 1.704um x 0.3um x 0.47um. In this orientation, the metallization layers would lie above the y=300nm boundary, while the Si-bulk layer and heat sink would lie below the y=0 boundary.

In order to generate the cell level thermal models at different technology nodes (45nm, 32nm, 22nm), the transistor feature sizes are scaled successively by a factor of 0.7 for each technology node. This reduces SRAM cell area by ~half at each technology node. Fig. 5 shows the cell layout dimensions for different technologies.

Inside the cell, the FLUENT CFD software is used to compute the temperature field using the unstructured mesh technique described in [9][10]. The generation of cell level compact model exploits the fact that the Fourier conduction equation with constant thermal properties is a linear elliptic boundary value problem. Thus, the average temperature of the cell level model can be uniquely written as:

6

01

avg f ff

T a T a qα=

= +∑ (1)

where the coefficients af determine the influence of the six boundary temperatures of the cuboidal domain of cell on Tavg. The coefficient a0 quantifies the influence of the heat generation rate q on the cell average temperature for a given activity α. By the

same token, the heat transfer rates qbj entering the cell through each of the six boundaries of the cell may also be written as:

6

01

1, 2,..6bj fj f jf

q b T b q jα=

= + =∑ (2)

For a given geometry and material properties, the coefficients af, a0, bfj and b0j can be uniquely determined and constitute the compact model of an SRAM bitcell. Seven temperature calculations are carried out using seven different sets of boundary conditions to determine seven coefficients (six af’s and a0) in Eq. 1. The same seven computations are also sufficient to determine the seven coefficients (bfj and b0j) for each of the six boundary face heat transfer rates (qbj) in Eq. 2. This compact model determines exactly (100%) the same average temperature and boundary heat transfer rate as the detailed FLUENT based solution for any set of Dirichlet boundary conditions on the boundaries.

3.2 First Level Thermal Maps The first task in thermal map generation is to assign power dissipation values to each cell in the floor plan. Power dissipation in SRAM array depends on its operating mode and can be given by:

total dynamic staticP P P= + (3)

The dynamic power dissipation (Pdynamic) can be neglected in L2 and L3 caches due to low activity. The dominant power consumption in SRAM is the static power (Pstatic) and is given by:

static dd leakP V I= × (4)

where, Ileak is the total leakage current in the SRAM cell.

In order to model leakage current in SRAM cells, HSPICE simulations are performed at HOLD mode with VDD=0.6V at different temperatures and technologies. 45nm, 32nm and 22nm PTM [11] models are used for transistors. As shown in Fig. 6, leakage current for each technology node can be approximated with a fitting function of the form B TA e ×× to account for the temperature dependence. The model matches well with the predictive technology HSPICE results as shown in Fig. 6. These analytical models are used to solve leakage current and temperature self-consistently during thermal map generation. The given analytical models will also take into account the different temperatures across the cache due to their temperature dependency.

The next step is to generate the cell-based representation of the floor plan. The SRAM cells are arranged in a planar mesh of cuboidal cells as shown in Fig. 7.a. Power values are assigned to

Figure 5. 8T thin-cell size for various technology node

Figure 6. Cell leakage vs. temperature for various nodes

Figure 7.a Floorplan of Figure 7.b Heat balance

SRAM cells at face i

125

each cell using the analytical models developed in Fig. 6. Eq. 1 relates the average cell temperature to the six ‘internal’ face temperatures of the cell as shown in Fig. 7.a. These temperatures are found by enforcing the continuity of heat transfer rate at each cell ‘internal’ face (Fig. 7.b) as given by Eq. 5. Equation 5 relates the face temperature of the shared face between the cells I and I+1 to all the other face temperatures and the heat generation rates of the two cells.

1

6 6

0 I 0 I+1f=1 f=1

( ) ( ) = 0

( ) ( ) 0

bi I bi I

fi f i fi f i

q q

b T b q b T b qα α

++

+ + + =∑ ∑ (5)

The final task in thermal map generation is to assign ‘external’ convective boundary conditions for the overall floor plan. Eq. 6 shows the boundary condition for left face (x=0) where h is the heat transfer coefficient and T∞ is ambient temperature.

0 0( | ) |boundary x bf xhA T T q∞ = =− = (6)

The h values for each face are selected to correctly model the thermal resistance due to metallization layers (on top), the wafer properties (in bottom), as well as lateral losses to other circuit blocks. Our thermal model incorporates both the buried oxide (BOX) layer and the SRAM cell layer. However, the interconnect layer is not included in the thermal model. This would have minimal effect on thermal maps since only ~5% of heat is dissipated through interconnects [7]. On the other hand, during thermal map generation, heat conduction through the Si-wafer is also taken into account. The boundary conditions used in the thermal map simulations of circuit blocks are summarized in Fig. 8. The vertical boundary conditions used in this work are obtained using the same procedure as in [7]. For the lateral heat transfer, we assumed adiabatic conditions since heat transfer on the side faces are negligible. Finally, under given boundary conditions, Eq. 1, 2, 5 and 6 are solved and thermal maps are generated.

3.3 Macro Level Thermal Models The generation of higher level macro thermal models for the SRAM cells is illustrated in Fig. 9. For the first level macro models, different number of bitcells is used at each technology

node in order to have the same block area for higher level macro models. Mesh size and die area for each hierarchy can be seen in Fig. 9. This results in different memory sizes and leakage power dissipation for different nodes as summarized in Table 1. As shown in Table 1, total leakage power at 22nm increases drastically due to scaling and increasing density. Macro level compact models for each hierarchical level are represented with the new set of coefficients (af, a0, bfj and b0j) as given in (Eq. 1) and (Eq. 2). The coefficients can be uniquely determined using the thermal map generation tool for seven Dirichlet boundary conditions (h=∞ in Eq. 6). Using this methodology, it is possible to generate the thermal maps for a cache containing millions of transistors in a time efficient way.

4. RESULTS AND DISCUSSIONS

4.1 Thermal Maps Using the hierarchical thermal modeling methodology, the thermal maps are generated for various technology nodes. The entire cache is organized as 10 columns, each having dimensions of 1.17mm X 324 µm (Fig. 9). In order to account for the hot-spot effect, a Dirichlet boundary condition at 900C is applied at left face (column = 0). However, it is important to note that the hot-spot temperature will increase due to increasing operating frequencies and integration as scaling continues. Moreover, as the devices are scaled from 45nm to 22nm, the leakage current/cell and the number of cells/area increase progressively. This leads to different temperature profiles as shown in Fig. 10 (assuming same hot-spot temperature in different technologies). It is observed that, the temperature of the 8-T SRAM cells which are closer to the hot-spot increases significantly due to thermally conductive silicon substrate. For 45nm and 32nm technology nodes, the effect of leakage on the temperature profile is not significant. This results in marginal difference in the thermal maps. For 22nm case, however, the leakage/cell and cells/area increase significantly (Fig. 10) and this affects the temperature profile considerably. At 22nm node, the temperature at the far end of 8-T SRAM array is 750C which is much higher than the corresponding 8-T bitcells in 45nm and 32nm technology nodes. Note that the effect of

Table 1. Macro Level Model Details

Macro Level

Memory Size (bytes)

Total Leakage Power at 300K (W) (VDD=0.6V)

Level 1 45nm: 66.1 32nm: 128 22nm: 253.1

45nm: 4.37x10-6 32nm: 1.22x10-5 22nm: 1.27x10-4

Level 2 45nm: 59.5K 32nm: 115.2K 22nm: 227.8K

45nm: 3.93x10-3 32nm: 1.09x10-2 22nm: 1.14x10-1

Level 3 45nm: 23.8M 32nm: 46M 22nm: 91.1M

45nm: 0.393 32nm: 1.09 22nm: 11.4

Figure 8. Boundary conditions for vertical heat dissipation

Figure 9. Macro level thermal models

126

dynamic power dissipation is not included in this analysis. Inclusion of dynamic power will further degrade the temperature profiles due to increased positive feedback. This would degrade the cell stability further and would require a larger keeper size for maintaining the noise immunity. This will in turn affect the 8-T performance.

4.2 Keeper Sizing Based on the thermal profile obtained for various technology nodes, local bitline dynamic sensing circuit is analyzed for noise robustness. The noise immunity of a dynamic gate is determined in terms of the Unity gain DC noise (UGDN). The UGDN is defined as the dc-noise level on all inputs of the dynamic gate generating an equal amount of noise at the output (output node in Fig. 2) of the static inverter immediately after the dynamic gate [12]. Any noise level higher than UGDN would propagate to following stages. The keeper size is determined such that for each LBL configuration, (i.e. for 16/32/64 cells per LBL) supported UGDN is 100mV (10 % of the VDD level). Keeper sizing is done at the maximum temperature corresponding to each technology node. This is because maximum temperature will require a stronger keeper due to increased leakage of the parallel NMOS transistors. The worst case noise condition occurs when all SRAM cells connected to that LBL store logic ‘1’ and noise on all read word lines is equal to the UGDN. For larger number of cells/LBL, the keeper is upsized to provide same UGDN at the maximum temperature.

4.3 LBL Sensing Delay With this keeper sizing methodology, the local bitline sensing delay is determined for 45nm/32nm/22nm technology nodes as well as for various LBL configurations. As shown in Fig. 11, the LBL sensing delay increases with cells/LBL. This is due to the increase in the dynamic node capacitance and the keeper size. For the bitcells which are farther from the hot-spot location, the LBL sensing delay is reduced because of the improved read current at lower temperatures. With technology scaling, reduction in LBL sensing delay is expected due to increased ON current and lower dynamic node capacitance. This phenomenon is observed from 45nm to 32nm technology node transition. However as shown in Fig. 11, 22nm node has the maximum LBL sensing delay. In 22nm node, leakage induced temperature rise necessitates much stronger keeper size to maintain DC noise robustness. As a result, the LBL sensing delay degrades in spite of the improved ON current with scaling.

4.4 Noise Robustness As mentioned earlier, keeper size is determined for the maximum temperature at each technology node. As temperature reduces towards the cold sections, increased ON current of the keeper and reduced leakage currents in parallel NMOS transistors improve noise robustness. Fig. 12 shows the normalized DC noise improvement for various technology nodes. Due to larger temperature gradients at 45nm and 32nm nodes compared to 22nm, larger improvement in DC robustness is obtained for 45nm and 32nm nodes (30% vs. 11%).

4.5 Temperature Dependent Keeper Sizing Spatial variation of temperature can be effectively utilized to downsize the keeper while maintaining the same DC noise robustness across the memory array. Thus, for each technology node, the keeper is sized such that at every temperature, same DC noise robustness is obtained. Fig. 13 shows the normalized keeper sizes for 45/32/22nm nodes at iso-Vnoise = 100mV. Fig. 13 shows that at 45nm and 32nm node, the required keeper size can be reduced by 40% for the same DC noise. This gives further improvement in LBL sensing delay as shown in Fig 14. However at 22nm node, lower temperature gradient across the memory array limits the keeper size reduction to 17% as shown in Fig. 13.

Figure 10. Thermal maps for 45nm/32nm/22nm tech. node

Figure 12. Spatial variation of normalized DC noise

robustness for constant keeper size

Figure 11. Spatial variation of Local Bitline (LBL) sensing

delay for various tech. node for 16/32/64 cells/LBL

127

4.6 Impact on Cell Stability The effect of spatial temperature variation on the data retention ability of the 8-T bitcell is determined by Static Noise Margin (SNM) analysis. Static noise margin is estimated graphically as the length of the side of the largest square that can be embedded inside the lobes of the butterfly curve [13]. Monte Carlo simulations for hold SNM are performed for different technology nodes at VDD =0.6V. σ(VTH) of 30mV/42mV/60mV is used for 45nm/32nm/22nm technology nodes, respectively. It is observed that, the average hold SNM for a given column decreases from 45nm to 22nm node as shown in Fig. 15. Moreover, the 8-T bitcells which are closer to the hot- spot show lower hold SNM compared to the cold end. Finally, the standard deviation in hold SNM increases with increase in the temperature for each technology generation.

5. CONCLUSIONS In this paper we have investigated the temperature effects on large sized caches at different technology nodes. Caches are usually considered to be “cold”. However, hot-spots created by other logic blocks such as the execution units, due to their proximity to caches, can affect memory stability. 8-T SRAM bitcell which is suggested as a possible replacement for the 6-T SRAM bitcell to improve robustness, are expected to be more prone to temperature variations due to its lower LBL capacitance [14]. This analysis points out that temperature variation within the cache memory should be considered for robust cache design.

6. ACKNOWLEDGEMENTS This work was sponsored in part by Focused Center Research Program (GSRC)

7. REFERENCES [1] N. Yoshinobu et al., “Review and future prospects of low-

voltage RAM circuits,” IBM journal of research and development, vol. 47, No. 5/6, pp.525-552, 2003

[2] S. Mukhopadhyay, et al., “Modeling of Failure Probability and Statistical Design of SRAM Array for Yield Enhancement in Nanoscaled CMOS,” IEEE Transactions on Computer Aided Design, pp. 1859-1880, December 2005.

[3] L. Chang, et al., “A 5.3GHz 8T-SRAM with Operation Down to 0.41V in 65nm CMOS”, VLSI Circuit Symposium, pp. 252-253, June, 2007

[4] Y. Morita, et al., “An Area-Conscious Low-Voltage-Oriented 8T-SRAM Design under DVS Environment” Proc. of VLSI Circuit Symposium, pp. 256-257, 2007

[5] N. Verma, et al., “A 65nm 8T Sub-Vt SRAM Employing Sense-Amplifier Redundancy” pp.328-329, Proc. of International Solid State Circuit Conference, 2007.

[6] S. Singh et al., “Compact thermal models for thermally aware design of VLSI circuits,” ITHERM, pp. 671-677, June, 2006.

[7] J. H. Choi, et al., “Self-Consistent Approach to Leakage Power and Temperature Estimation to Predict Thermal Runaway in FinFET Circuits,” IEEE Transactions on Computer Aided Design, pp. 2059-2068, Nov. 2007.

[8] F. Arnaud et al, “A functional 0.69um2 embedded 6t-sram bit cell for 65nm CMOS platform,” Symposium on VLSI Technology, Digest of Technical Papers, pp. 65–66, 2003.

[9] Fluent User Manual, Fluent Inc., Lebanon, NH 03766, 2005.

[10] S.R. Mathur, et al., “Unstructured Finite Volume Methods for Multi-Mode Heat Transfer,” Advances in Numerical Heat Transfer, Taylor and Francis, vol.2, pp. 37-67, 2000.

[11] Predictive Technology Model (PTM). [Online]. Available: http://www.eas.asu.edu/~ptm/

[12] A. Alvandpour, et al., “A sub-130- nm conditional keeper technique,” IEEE Journal of Solid-State Circuits, vol. 37, pp. 633–638, May 2002.

[13] E. Seevinck, et al., “Static noise margin analysis of MOS SRAM cells,” IEEE J. Solid-State Circuits, vol. SC-22, no. 5, pp. 748–754, Oct. 1987

[14] J. P. Kulkarni, et al., “Nano-scaled SRAM Thermal Stability Analysis Using Hierarchical Compact Thermal Models”, ITHERM, May, 2008.

Figure 14. Spatial variation of LBL delay for variable

keeper size

Figure 13. Temp. dependent keeper size at constant DC

noise

Figure 15. Hold SNM variation (Mean and Sigma) Monte

Carlo results

128

thermal analysis of 8-t sram for nano-scaled

Documents