a low power logic-compatible multi-bit memory bit cell architecture with differential pair and...

9
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014 3367 A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs John Lynch and Pedro P. Irazoqui, Senior Member, IEEE Abstract—The architecture in this work uses a logic-compatible CMOS process particularly suitable for embedded applications. The differential pair construct causes the read and refresh power to be independent of any process parameter including the within-die threshold voltage. The current stop feature keeps the read voltage transition low to further minimize read power. The bit cell op- erates in both single bit BASE2 and multi-bit BASE4 modes. An expression for the read signal was veried with bit cell simula- tions. These simulations also compare the performance impact of threshold voltage variance in this architecture with a standard gain cell. A DRAM bit cell array was fabricated in the XFab 180 nm CMOS process. Measured waveforms closely match theoretical re- sults obtained from a system simulation. The silicon retention time was measured at room temperature and is greater than 150 ms in BASE2 mode and greater than 75 ms in BASE4 mode. 180 nm, 25C analysis predicts 0.8 uW/Mbit refresh power at 630 MHz, the lowest in the literature. Further: the memory bit cell architecture presented here has a refresh power delay product several times lower than any other published architecture. Index Terms—Current stop, differential pair, eDRAM, embed- dable, logic-compatible, low power, MLDRAM, multi-bit, opamp, threshold voltage. I. INTRODUCTION T HE memory industry is continually seeking to improve the attributes of power consumption, read access time, and memory capacity. The relationship within each attribute and between attributes is complex and each attribute has mul- tiple contributing factors. Power consumption consists of read, write, restore and refresh power. These in turn are affected by noise sensitivity, retention time, leakage, and threshold voltage. Read access time is affected by rate and amplitude of bit line change, delay, and required clock cycles. Capacity is affected by technology node, architecture (1 T, 2 T, 3 T, 6 T) and bits per cell. In addition, when one attribute is improved a tradeoff is often needed with one or more of the other attributes. For ex- ample, when power consumption is decreased read access time increases, capacity decreases, or both. Recent 1T1C approaches have reduced read access time at the expense of capacity [1] and increased memory capacity at the expense of power consumption [2]. Both of these, along with all other 1T1C architectures, suffer from full-scale signal swings Manuscript received November 30, 2013; revised April 11, 2014, May 30, 2014, and June 15, 2014; accepted June 16, 2014. Date of publication September 16, 2014; date of current version November 21, 2014. This paper was recom- mended by Associate Editor C. P. Ravikumar. The authors are with the Weldon School of Biomedical Engineering Purdue University, West Lafayette, IN 47907 USA (e-mail: lynch33@ purdue.edu; [email protected]). Digital Object Identier 10.1109/TCSI.2014.2334791 on high capacitance bit lines, a read implementation based on charge sharing, and a destructive read process. The rst two ulti- mately cause higher power consumption and the latter lengthens read access time. After falling away in the 1970’s, research on logic-compatible gain cells is once again becoming prevalent. Although smaller in capacity, gain cells overcome the shortcomings related to the 1T1C architecture. In the literature some recent gain cell im- provements include reduced refresh power and read access time [3], reduced leakage power [4], increased capacity and reduced refresh power [5] and increased capacity [6]. In general, the lit- erature for gain cell research describes operation over a wide range of frequencies with a variety of technology nodes. Yet each of these designs is unsuccessful in completely overcoming two drawbacks of gain cell architecture: rst, the read signal, dened as the change in read bit line voltage due to the bit cell current, is a strong function of the threshold voltage of the read transistor; second, the read access time increases rapidly as the stored voltage is lowered. In this paper we present analysis, simulation and silicon test results for a logic-compatible memory architecture [7] that improves or overcomes the 1T1C and gain cell drawbacks and effectively reduces read power requirements, sometimes by an order of magnitude. The new architecture also increases memory capacity, in some cases doubling it, while maintaining comparable read access times to both 1T1C and gain cell architectures. II. DESIGN Fig. 1(a) depicts a standard gain cell architecture. Read power is caused by voltage transitions on the read bit lines. Refresh power is caused by voltage transitions on both the read and write bit lines. The rate at which refresh occurs is the retention time and controls the refresh power. The retention time is determined from the rate the storage capacitor is discharged by the various leakage currents in the bit cell. Compared to the 1T1C, gain cells have reduced active power due to smaller voltage transitions. Gain cells have shorter read access times due to nondestructive reads. Gain cells also have reduced noise sensitivity because the read signal is not derived from a charge sharing process. These advantages for the gain cell with respect to the 1T1C come at the expense of reduced capacity. In addition, the gain cell read signal is a function of the threshold voltage of the read transistor in the bit cell. The threshold voltage variance necessitates increased voltage swings on the write bit line to accommodate the largest variance thereby increasing write and refresh power. 1549-8328 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Upload: pedro-p

Post on 30-Mar-2017

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014 3367

A Low Power Logic-Compatible Multi-Bit MemoryBit Cell Architecture With Differential Pair and

Current Stop ConstructsJohn Lynch and Pedro P. Irazoqui, Senior Member, IEEE

Abstract—The architecture in this work uses a logic-compatibleCMOS process particularly suitable for embedded applications.The differential pair construct causes the read and refresh power tobe independent of any process parameter including the within-diethreshold voltage. The current stop feature keeps the read voltagetransition low to further minimize read power. The bit cell op-erates in both single bit BASE2 and multi-bit BASE4 modes. Anexpression for the read signal was verified with bit cell simula-tions. These simulations also compare the performance impact ofthreshold voltage variance in this architecture with a standard gaincell. A DRAM bit cell array was fabricated in the XFab 180 nmCMOS process. Measured waveforms closely match theoretical re-sults obtained from a system simulation. The silicon retention timewas measured at room temperature and is greater than 150 ms inBASE2 mode and greater than 75 ms in BASE4 mode. 180 nm,25C analysis predicts 0.8 uW/Mbit refresh power at 630 MHz, thelowest in the literature. Further: the memory bit cell architecturepresented here has a refresh power delay product several timeslower than any other published architecture.

Index Terms—Current stop, differential pair, eDRAM, embed-dable, logic-compatible, low power, MLDRAM, multi-bit, opamp,threshold voltage.

I. INTRODUCTION

T HE memory industry is continually seeking to improvethe attributes of power consumption, read access time,

and memory capacity. The relationship within each attributeand between attributes is complex and each attribute has mul-tiple contributing factors. Power consumption consists of read,write, restore and refresh power. These in turn are affected bynoise sensitivity, retention time, leakage, and threshold voltage.Read access time is affected by rate and amplitude of bit linechange, delay, and required clock cycles. Capacity is affectedby technology node, architecture (1 T, 2 T, 3 T, 6 T) and bitsper cell. In addition, when one attribute is improved a tradeoffis often needed with one or more of the other attributes. For ex-ample, when power consumption is decreased read access timeincreases, capacity decreases, or both.Recent 1T1C approaches have reduced read access time at the

expense of capacity [1] and increased memory capacity at theexpense of power consumption [2]. Both of these, along with allother 1T1C architectures, suffer from full-scale signal swings

Manuscript received November 30, 2013; revised April 11, 2014, May 30,2014, and June 15, 2014; accepted June 16, 2014. Date of publication September16, 2014; date of current version November 21, 2014. This paper was recom-mended by Associate Editor C. P. Ravikumar.The authors are with the Weldon School of Biomedical Engineering Purdue

University, West Lafayette, IN 47907 USA (e-mail: lynch33@ purdue.edu;[email protected]).Digital Object Identifier 10.1109/TCSI.2014.2334791

on high capacitance bit lines, a read implementation based oncharge sharing, and a destructive read process. The first two ulti-mately cause higher power consumption and the latter lengthensread access time.After falling away in the 1970’s, research on logic-compatible

gain cells is once again becoming prevalent. Although smallerin capacity, gain cells overcome the shortcomings related to the1T1C architecture. In the literature some recent gain cell im-provements include reduced refresh power and read access time[3], reduced leakage power [4], increased capacity and reducedrefresh power [5] and increased capacity [6]. In general, the lit-erature for gain cell research describes operation over a widerange of frequencies with a variety of technology nodes. Yeteach of these designs is unsuccessful in completely overcomingtwo drawbacks of gain cell architecture: first, the read signal,defined as the change in read bit line voltage due to the bit cellcurrent, is a strong function of the threshold voltage of the readtransistor; second, the read access time increases rapidly as thestored voltage is lowered.In this paper we present analysis, simulation and silicon test

results for a logic-compatible memory architecture [7] thatimproves or overcomes the 1T1C and gain cell drawbacks andeffectively reduces read power requirements, sometimes byan order of magnitude. The new architecture also increasesmemory capacity, in some cases doubling it, while maintainingcomparable read access times to both 1T1C and gain cellarchitectures.

II. DESIGN

Fig. 1(a) depicts a standard gain cell architecture. Read poweris caused by voltage transitions on the read bit lines. Refreshpower is caused by voltage transitions on both the read and writebit lines. The rate at which refresh occurs is the retention timeand controls the refresh power. The retention time is determinedfrom the rate the storage capacitor is discharged by the variousleakage currents in the bit cell.Compared to the 1T1C, gain cells have reduced active power

due to smaller voltage transitions. Gain cells have shorter readaccess times due to nondestructive reads. Gain cells also havereduced noise sensitivity because the read signal is not derivedfrom a charge sharing process. These advantages for the gaincell with respect to the 1T1C come at the expense of reducedcapacity. In addition, the gain cell read signal is a functionof the threshold voltage of the read transistor in the bit cell.The threshold voltage variance necessitates increased voltageswings on the write bit line to accommodate the largest variancethereby increasing write and refresh power.

1549-8328 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Page 2: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

3368 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014

Fig. 1. (a) Standard 3 T gain cell. (b) Modified differential pair bit cell.

The first goal then is to eliminate the unpredictable effect ofthreshold voltage variance. The second is to significantly reducethe voltage swing on the high capacitance read bit lines.

A. Modified Differential Pair ConstructIn standard gain cell architecture die-to-die read bit line

voltage variance is caused by within-die threshold voltages andnegatively affects the behavior of bit line current. The proposedarchitecture [7] removes the effect of within-die thresholdvoltage by adding a transistor, M3. M3 forms a differential pairconstruct with M2 in the gain cell, as shown in Fig. 1(b). Sincethe within-die threshold voltage of the two transistors M2 andM3, are effectively equal within a specified tolerance,1 theycancel each other out. Thus the effect of within-die thresholdvoltage on bit line current is removed. Consequently theproblem of die-to-die bit line voltage variance is also removedand the predictability of bit line current is greatly improved.An op amp supplies the necessary current allowing multiple

bit cells to share the M3 reference transistor. The shared refer-ence transistor, op amp and M2 read transistor form a modifieddifferential pair. Subsequently, we refer to this bit cell architec-ture as an MDP bit cell or MDP memory. To understand the re-lationship between the two transistors M2 and M3 in Fig. 1(b),consider Kirchhoff’s Voltage Law. Applying the law we have,

(1)

where and are the gate to source voltages of M2and M3 respectively, is the voltage on the storage nodeand is the reference voltage. Using the equation for thesaturation current for a transistor in weak inversion mode [8],

is defined as

(2)

where , ,and and is

defined as

(3)

where in the channel,per unit area, ,

, , and. It is generally held that the

within-die process parameters, the factors in (2) and (3), areconsidered equal within a specified tolerance for all transistorson any one die. Making the appropriate substitutions it is clear

1See Appendix for a method to determine a tolerable within-die mismatch.

that and that the value of the threshold voltage ofM3 cancels the value of the threshold voltage of M2.To define the read signal, or change in bit line voltage, for the

MDP memory we start with

(4)

Making the appropriate substitutions based on Fig. 1(b) andsolving for we have

(5)

where is the change in bit line voltage, is the bitline capacitance, is the drain current for M2, and is thedrain current for M6. We intentionally use a low ibias to operateM6 as a current source in weak inversion mode so that its sat-uration voltage, , is approximately 100 mV [8]. Duringthe read operation the read bit line is driven to an equilibriumcondition where equals and the current discharging thebit line capacitance is zero. Hence, the read bit line voltage doesnot fall any further. And specifically, for the logic 0 condition,M6 functions to strictly define and limit the change in bit linevoltage to its saturation voltage, approximately 100 mV. In thecase of non-logic 0 on the storage node, (5) simplifies to

(6)

where , is the read bitline voltage at time , is the storage node transistorM2 current, is the value of the p-channel current sourcetransistor M6, is a portion of the time the read transistor M1 isheld closed and , is the point in timethe read bit line is measured, is the point in time the readtransistor M1 goes closed, occurs in time after and

is the value of the read bit line parasitic capacitance.By making appropriate substitutions of (1) and (2) into (6),

the equation defining the MDP change in read bit line voltage,or read signal, for non-logic 0 values is

(7)where is the voltage on the storage node, isthe voltage on the reference node, ,

, is the value of the p-channelcurrent source transistor, is the time the read transistor isheld closed, and is the value of the read bit line parasiticcapacitance. Threshold voltage is not a factor in this equation.

B. MDP Bit Cell Operation

At this point it is useful to review the MDP single bit per cell,or BASE2, read operation [7]. In BASE2 operation the storedvoltage is one of two values, typically 0.5 V or 0.8 V. The timingdiagrams are depicted in Fig. 2. Before the read process startsthe transistor M5 pre-charges the read bit line. At the start ofthe read process the precharge input is de-asserted and shortlyafter the read input is asserted. The voltage on the read bit line isthen controlled with a current that is a function of the differencebetween the stored voltage and the reference voltage as seen bythe term in (7). If the stored voltage is less thanthe reference voltage the read bit line voltage will change nomore than the value of the saturation voltage of M6. But if thestored voltage is greater than the reference voltage the read bit

Page 3: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

LYNCH AND IRAZOQUI: LOW POWER LOGIC-COMPATIBLE MULTI-BIT MEMORY BIT CELL ARCHITECTURE 3369

Fig. 2. Timing diagrams for MDP BASE2 read operation.

Fig. 3. Timing diagrams for MDP BASE4 read operation.

line will be pulled down by the current in the storage transistorM2 until the topological limit is reached. The bit line amplifieracts as a comparator and uses an appropriately low switchingvoltage to detect change, and so discerns the value representedby the voltage on the storage node.Compared to a standard gain cell, the MDP bit cell, with

its three transistors and shared reference transistor, has approx-imately the same storage capacity. However, the capacity ofan MDP bit cell is nearly doubled when it is used in multi-bitmode having multiple logical bits in one bit cell. Subsequently,we refer to an MDP implementation having one of four log-ical values in the bit cell as two bits per cell or MDP BASE4.The modified differential pair eliminates the impact of the un-predictable threshold voltage variance on the required bit cellvoltage, and subsequently on the current controlling the readbit line [7]. The insensitivity of the design to threshold voltagevariance enables smaller voltage intervals between logic valuesand allows the MDP bit cell to reliably accommodate four log-ical values. The read operation of MDP BASE4 is similar tothe MDP BASE2 operation. In BASE4 mode, instead of com-paring the storage node to a single reference, it is compared tothree references one at a time and one after another in a sequen-tial order causing the read bit line to respond accordingly. Thepoint in the sequence of comparisons that the read bit line firstdrops beyond the switching voltage of the comparator indicatesthe value stored on the storage node. The bit line amplifier actsas the comparator and outputs a digital indicator to the logicdecoder. The logic decoder uses the indicator, specifically thepoint in the sequence of comparisons the indicator is asserted,to discern the digital value represented by the voltage on thestorage node. Fig. 3 illustrates MDP BASE4 timing diagrams.

Fig. 4. MDP bit cell architecture with current stop.

C. MDP Logic and Reference Voltage Generation

The logic voltage levels forMDPmemory are generated froma stable source, such as a band gap, and are spaced as a func-tion of the desired frequency of operation. MDP architecturefacilitates the accurate reference voltages required for multi-bitoperation. The reference voltage sources see high impedance atthe gate of M3 in Fig. 1(b) and are tapped from matched com-ponents relative to the logic level voltages.An analysis of MDP BASE4 voltage reference levels and

threshold voltage mismatch is in the Appendix.

D. Current Stop Construct

The important information content, the determination of thedigital value represented by the voltage on the storage node,is contained in the movement of the read bit line away fromits clamped value to a value greater than the switching voltagelevel of the bit line amplifier. Any movement in the read bitline beyond the switching level of the bit line amplifier causeswasted power. Therefore, it is advantageous to stop the currentat a point after the voltage change is deemed significant andbefore the inherent limit due to the circuit topology is reached.[7].Additionally, the lower the switching voltage level the greater

the power saved. In the MDP memory the switching voltagelevel is kept relatively small, approximately 150 mV.Fig. 4 illustrates anMDPmemory with current stop. The istop

transistor acts as a switch to disconnect the opamp from M2thereby terminating M3’s control of the read bit line. In thisway, the voltage transitions on the high capacitance read bitlines are drastically reduced because the signal on the read bitline changes only enough to be sensed reliably. Power is corre-spondingly reduced for all reads of the memory system [7].Current stop control is shared between all rows in one

column. Thus there need be only one current stop circuitry percolumn.Fig. 5 displays timing diagrams in BASE4 mode using the

current stop feature. It is clear from these timing diagrams thatthe magnitude of the voltage transition on the read bit line issubstantially smaller than the timing diagrams in Fig. 3 withoutcurrent stop. The current stop feature not only reduces power italso reduces read access time.

E. Read Power

Electrical power is defined as

(8)

Page 4: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

3370 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014

Fig. 5. Timing diagrams for MDP BASE4 read operation using current stop.

Fig. 6. One Common Source Domain with n rows and k columns.

Fig. 7. One opamp shared between bit cells.

where , , is the bit line capacitance,is the frequency, and the bit line voltage swing, , is assumedto equal yielding the common equation

(9)

In anMDPmemory using current stop as previously mentioned,a typical and , and so does notequal and we use the more precise

(10)

The power saved is determined by the ratio of read bit linevoltage to . Therein lies major read power savings.

Assuming total power includes read, refresh and DC biaspower we use

(11)

where is the read current, is the read bit line clampvoltage, is the refresh current, is the write bit linehighest level, is a constant representing the read duty cycle,is the corresponding refresh duty cycle, is the DC bias

current for the MDP opamps and bit line amplifiers, is ananalog voltage supply and is the power in the row decoders,logic decoders, read channel and write channel peripherals. Theread current is further defined as

(12)

and the refresh current is further defined as

(13)

where is the read bit line capacitance, is the bit linevoltage swing for read, is the bit line voltage swing forwrite, is the read frequency, is the refresh frequency,is the number of columns of bit cells, and is the number ofmemory modules.The refresh frequency is a fixed value based on the retention

time and the number of rows being refreshed. Read power atlower frequencies is dominated by the refresh current and con-versely read power at higher frequencies is dominated by theread current.

F. System Benefits of MDP ArchitectureBit Line Amplifier: The MDP principle is also applicable

to bit line amplifiers and greatly reduces static power. It istherefore reasonable to have the amplifiers active while theyare waiting for a read bit line transition.The MDP principle in bit line amplifiers also increases noise

performance by allowing global adjustment of the bit line am-plifier switching voltage.Bit Cell: MDP bit cell architecture is more robust with re-

spect to noise. It has much larger read signal than 1T1C archi-tectures because it is not subject to the attenuation caused bycharge sharing between the bit cell and the bit line. MDP ar-chitecture also has much larger read signal than some gain celldesigns [3], [6]. Instead of a short and finite read signal witha small maximum, MDP memory has an ever increasing readsignal.Opamp: The MDP op amp in Fig. 1(b) and Fig. 4 is mul-

tiplexed between groups of rows with the hold node transistorsources in each column tied together. We refer to one group ofrows as a Common Source Domain, Fig. 6. Fig. 7 illustrates anopamp shared among multiple Common Source Domains.The multiplexer limits the active number of bit cells driven

by an opamp at any one time by enabling a reasonable capaci-tive load for the opamp. For example, assuming 0.6 fF of para-sitic capacitance for each bit cell and Common Source Domainsmade up of 32 rows and 512 columns of bit cells, the opampdrives a 10 pF capacitive load for the 16,384 bit cells in theactive Common Source Domain. A reasonable number of ac-tive and inactive bit cells multiplexed with a single opamp is250,000.In BASE2 for any frequency the output of the opamp is con-

stant. In BASE4 at low frequencies the output of the opamp

Page 5: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

LYNCH AND IRAZOQUI: LOW POWER LOGIC-COMPATIBLE MULTI-BIT MEMORY BIT CELL ARCHITECTURE 3371

TABLE IBITCELL SIMULATION PARAMETERS

is stepping between three levels. In BASE4 at higher frequen-cies an opamp stepping quickly between three voltage levelsneeds a higher bias current than the power budget allows. Inthis case, three opamps are used with constant outputs instead ofone opamp with a stepping output. Designing in this way allowsfor much more speed and there is no longer a limit imposed bythe finite step response of the opamp. Each of the three opampsworking in BASE4 mode can work with the same quiescent cur-rent as opamps in the BASE2 mode.

III. RESULTS

We simulated the MDP bit cell with Spectre to confirm (7).We compared the simulated performance of the MDP bit cellwith simulated performance of a traditional gain cell. We fab-ricated an MDP test structure, simulated the test structure, andcompared the system simulation results to the measured siliconresults. Thus we established that our equations are verified bythe MDP bit cell simulation, and our system simulation accu-rately models the MDP test structure. Finally, we used our anal-ysis as the basis for calculating power for larger 1 T, 3 T gaincell andMDP BASE2 and BASE4memory systems. Our resultsare divided into four sections: a comparison of analysis and bitcell simulation, a comparison of system simulation and silicon,estimated power usage for the four types of larger memory sys-tems, and comparison to other state of the art eDRAM.

A. Analysis and Simulation

Our bit cell simulations were divided into two groups usingthe input parameters in Table I. In the first group of simulationswe measured the read signal. We applied a read pulse of 100ns which equates to a frequency of 10 MHz, set the thresholdvoltage variance to 0 mV, varied the storage node voltage andmeasured read signal for the MDP bit cell of Fig. 1(b) in bothBASE2 and BASE4 modes across the stated range of storagenode voltages. Graphs in Fig. 8 plot the read signal of (7) com-pared to the MDP BASE2 and MDP BASE4 bit cell simula-tion read signal, and illustrate the bit cell simulations closelyresemble the equation.Read signal results from the first group of bit cell simula-

tions are also graphed in Fig. 9 for a standard gain cell and a

Fig. 8. Simulation of read signal verifies equation for (a) MDP BASE2 and (b)MDP BASE4 at 100 ns read pulse.

Fig. 9. Simulated impact of threshold voltage variance on minimum logic1 andwrite voltage transition at 100 ns for (a) the gain cell and (b) the MDP BASE2bit cell.

MDP BASE2 bit cell across the range of threshold variances inTable I and illustrate the effect that threshold variance has ondesign and write power requirements. According to the valuesin Table I, we graphed the read signal with respect to the writebit line transition, or swing, necessary to change from logic 0 tologic 1. Fig. 9(a) illustrates the impact threshold variance has onthe logic 1 minimum voltage in a standard gain cell. With 0 Vthreshold variance the gain cell logic 1 minimum needs to tran-sition the write bit line about 0.44 V to attain the desired readsignal. But by accommodating a typical 100 mV die-to-dievariance, the write bit line transition must be approximately 0.11V greater, or about 0.55 V, to attain the required read signal.Thus in the gain cell the die-to-die threshold variance requiresthe write bit line transition an additional 0.11 V in order to ac-commodate variance. In this way the threshold variance has adirect impact on the gain cell write and refresh power consump-tion.In contrast to Fig. 9(a), Fig. 9(b) shows the lack of effect the

threshold variance has in anMDP bit cell. As expected from (1),the MDP graph shows no difference in the read signals over therange 100 mV to 100 mV of threshold variance. The threesimulation curves are directly on top of each other. The logic 1minimum in the MDP simulation requires only about a 0.18 Vwrite transition, regardless of the threshold variance. The writebit line transition necessary to change logic states is less thanhalf compared to the gain cell. In this way the MDP bit cell usesapproximately 50% less average power to write the bit cell.In the second group of bit cell simulations we measured delay

and compared the gain cell to the MDP BASE2 bit cell. We setvalues according to Table I, including the threshold voltage vari-ance and storage node, turned on the read pulse and measuredthe elapsed time it took to reach the desired read signal andso determined the operating frequency for any particular logic1 minimum voltage. The simulation data graphed in Fig. 10for a standard gain cell and for a MDP BASE2 bit cell illus-trate the effect threshold variance has on frequency and writepower requirements. The change in bit line voltage necessary to

Page 6: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

3372 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014

Fig. 10. Simulated impact of threshold voltage variance on write voltage tran-sition at various frequencies for (a) the gain cell and (b) the MDP bit cell.

Fig. 11. Layout and dimensions of the bit cell.

write a logic 1 was graphed on the x axis to depict power re-quirements. Fig. 10 illustrates the tradeoff between frequencyand write power; as frequency increases the power necessaryto write or refresh the bit cell increases. Fig. 10(a) for the gaincell shows the read frequency falls off rapidly as write voltagetransition lowers and varies sometimes as much as two ordersof magnitude due to die-to-die threshold variance. Fig. 10(b)shows the frequency falls off only modestly for the MDP bitcell as write transition voltage lowers and the threshold varia-tion has virtually no effect. For example, operating at 100 MHzthe gain cell needs to transition about 0.72 V on the write bitline while the MDP bit cell need only transition about 0.34 Vand saves power accordingly.

B. Simulation and Silicon

We fabricated an MDP memory test structure die in a 180nm CMOS process to prove the modified differential pair con-struct with and without current stop. We verify silicon wave-forms closely resemble system simulation waveforms. Layoutand bit cell dimensions are in Fig. 11. The test structure die con-sists of one MDP module with two columns and eight rows fora total of 16 bit cells. There is a row decoder on the die, a ref-erence transistor, and a transistor for each column to implementthe current stop switch.Wewire-bond the die to a board that alsohas three discrete opamps. One opamp acts as a source followerinteracting with the bit cells to make up the modified differen-tial pair construct and the other two opamps buffer the bit lines.There are also transistors on the board to supply the bias currentfor the bit lines and clamps to clamp the bit lines to the voltagelevel when the precharge input is asserted LO. There areDACs to generate the write bit line and read reference voltages.The comparators and logic to control the current stop feature arealso on the board.We create a system simulation schematic with appropriate

models to emulate both the test structure die and the PCB func-tionality. We generate vectors and import them into the pattern

Fig. 12. Silicon behavior matches system simulation.

generator on a logic analyzer. We use the vectors from the pat-tern generator to stimulate the circuit under test on the benchand the same vector values to stimulate the system simulation.Fig. 12 contains both simulation waveforms and oscilloscope

waveforms from the silicon tests grouped by logic value. Wave-forms with and without current stop are in the figure. The logic00 case is trivial and is not included. The images without cur-rent stop clearly show the bit line voltage stepping down intothe ohmic region and the images with current stop clearly showthe impact of terminating the voltage change. All images showthe silicon behavior matches the system simulation.We measure the worst case retention time at room tempera-

ture in both BASE2 and BASE4 modes. A common voltage foreach bit cell is placed on the hold node by writing to each ofthe bit cells. The time between the write and the reading of eachcell is gradually increased until the first read failure in any oneof the cells occurs. In this way, the worst case retention timeis observed for the test structure. The worst case retention timewas measured to be 150 ms for BASE2 and 75 ms for BASE4at room temperature.

Page 7: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

LYNCH AND IRAZOQUI: LOW POWER LOGIC-COMPATIBLE MULTI-BIT MEMORY BIT CELL ARCHITECTURE 3373

TABLE IIREAD POWER PARAMETERS USED IN (11)

and are in Fig. 9(a); and are in Fig. 9(b)

Fig. 13. Pipelined total read power.

C. Read Power Comparison

Using (11) and values for the equation variables given inTable II we calculate the total power to read 16-bit words in bothrandom access and pipelined reads for the 1T1C, 3 T gain cell,MDP BASE2 and MDP BASE4 memory architectures. Fig. 13and Fig. 14 summarize the results.In Table II a conservative retention time, three times less

than the value measured, is used. The value of the power dueto the peripherals is verified small, much less than 5%, com-pared to the bit line power in the system; consequently, thevalue of in Table II is assumed zero. The power in theopamps and the bit line amplifiers are factored in with the term

in Table II.The insensitivity to the threshold voltage and reduced voltage

swing on high capacitance bit lines greatly reduce the totalpower for the MDP memory compared to the 1T1C and the 3T gain cell. For example, an MDP BASE4 pipelined read of16-bit words at 630 MHz uses 90 and the 1T1C pipelinedread uses 2276 . At 630 MHz the MDP BASE4 power is96% less than 1T1C. An MDP BASE2 random access read of16 bit words at 630 MHz uses 419 and the 3 T gain cell

Fig. 14. Random access total read power.

TABLE IIIEDRAM TOPOLOGY COMPARISON

result is from simulation; result is from analysis; @ 110C;

@85C; Adjusted from 85C to 25C by 0.5/11C factor; estimated using

implemented layout design rules

uses 2458 . At 630 MHz the MDP BASE2 power is 82%less than the 3 T gain cell.

D. eDRAM Comparison

Table III compares bit cell size normalized to technologynode, bit cell storage capacitance, our measured results, sim-ulation and analysis to eDRAM in the literature.Our MDP BASE2 analysis in 180 nm at 630 MHz predicts

0.8 refresh power at 25C. The work of [3] in 65 nmat 667MHz reports 109 refresh power at 85C and the

Page 8: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

3374 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 12, DECEMBER 2014

Fig. 15. eDRAM standardized energy comparison.

work of [5] in 180 nm at 330 KHz reports 5.4 refreshpower at 25C. We standardized the comparison in Fig. 15 usingthe power-delay product. This graph shows the MDP relativeenergy per bit is more than three times lower than the work in[3] and more than 10000 times lower that the other cited works.

IV. CONCLUSION

The memory system in this work successfully reads andwrites BASE2 and BASE4 values to the bit cells with noobservable errors proving good noise margin even with largecapacitive loads from the pads on the bit lines. The measuredsilicon and the bit cell simulations prove the MDP memoryconcept and show that the assumptions made for the largememory system power calculations are reasonable.The modified differential pair architecture eliminates the

effect of die-to-die threshold voltage variance in the memorysystem. This has two major effects on system operation. Itlowers write bit line voltage transition by more than 0.3 V andenables four logic values per cell. The current stop feature hasthe additional effects of reducing read bit line voltage transitiondown to 150 mV and reducing read access time accordingly.These reductions to the write and read bit line voltage transi-tions result in 90 active power for 16 bit pipelined reads ina BASE4 MDP at 630 MHz compared to 2276 in 1T1C,a power savings of 96%. The multi-bit capability results inincreased capacity by up to 50% over standard 3 T gain cells.We verified our equation describing read signal with our bitcell simulation and verified our system simulation with our testsilicon. The retention time measurements of the test structurequantify the worst case bit cell leakage. This work, with aprojected refresh power of 0.8 and a random accessfrequency of 630 MHz, compares favorably with reportedresults from the literature. Implementing this eDRAM memoryin a larger memory system and smaller technology node and ap-plying the architecture to SRAM are topics for further research.

APPENDIXMANAGING THE WITHIN-DIE VARIATION MISMATCH

The preceding results compared the performance of MDPmemory to gain cells in the presence of die-to-die thresholdvoltage variance. For both MDP memory and traditional gaincells, there are also within-die variations.Within-die threshold voltage variation mismatch is inversely

proportional to the area of the transistors and is given by

(14)

where is the gate width, is the gate length, and the propor-tionality constant is technology dependent [9]. To deter-

Fig. 16. Relationship between reference voltage levels, logic voltage levels andmismatch in MDP BASE4 bit cell.

Fig. 17. Monte Carlo mismatch simulation for MDP memory read access timewith .

mine the effect of the mismatch in (14) on the read signal in (7),substitute the expression

(15)

in place of where

(16)

where is the effective reference voltage due to thewithin-die mismatch of the threshold voltage, is thereference voltage, is the 4-sigma value of the within-diemismatch of the threshold voltage of the modified differentialpair, is the M2 threshold voltage mismatch and isthe M3 threshold voltage mismatch.Fig. 16 illustrates the relationship used to set the voltage ref-

erence levels in a BASE4 MDP memory bit cell as a functionof the within-die threshold voltage mismatch. A Gaussian curvefor each of the three reference voltages represents the within-diemismatch of the threshold voltage. The mismatch determinesthe value which in turn defines the minimum logicvalue using (7).The test structure in this work has modified differential pair

transistor sizes of , for M2 and, for M3. The proportionality constant

equals 5.5 for the XFAB 180 nm process. Therefore,from (16) the theoretical within-die mismatch for the bit cellimplemented in this work is 86 mV.In MDP memory, since M3 is shared with many bit cells, it

can be arbitrarily large. The M2 hold node transistor can alsobe made much larger than minimum size for the technologynode because its gate capacitance can make up part or all of thestorage capacitance. If the larger sizes of ,were chosen for M2 and , for M3, the the-oretical within die mismatch is reduced to 33 mV.Figs. 17, 18, and 19 show the results of Monte Carlo mis-

match simulations each with 500 runs for the MDP bit cell ofFig. 1(b) with the physical values implemented in the test struc-ture given in Fig. 11. The three simulations illustrate the effect

Page 9: A Low Power Logic-Compatible Multi-Bit Memory Bit Cell Architecture With Differential Pair and Current Stop Constructs

LYNCH AND IRAZOQUI: LOW POWER LOGIC-COMPATIBLE MULTI-BIT MEMORY BIT CELL ARCHITECTURE 3375

Fig. 18. Monte Carlo mismatch simulation for MDP memory read access timewith .

Fig. 19. Monte Carlo mismatch simulation for MDP memory read access timewith .

on read access time of the drifting in the hold node voltage from1.1 to 1.0 V for logic b’10. The reference voltage for logic b’10of 0.84 V is greater than the b’01 logic level of 0.75 V. Thesimulations are run with 0.1 pF on the read bit line, the equiva-lent of approximately 128 bit cells per bit line. For all cases, thespecified target of 40 ns read access time was achieved in thepresence of mismatch.

REFERENCES

[1] D. Somasekhar, S. Lu, B. Bloechel, G. Dermer, K. Lai, S. Borkar, andV. De, “A 10 Mbit, 15 GBytes/sec bandwidth 1 T DRAM chip withplanar MOS storage capacitor in an unmodified 150 nm logic processfor high-density on-chip memory applications,” in Proc. ESSCIRC,Grenoble, France, 2005, pp. 355–358.

[2] J. C. Koob, S. A. Ung, B. F. Cockburn, and D. G. Elliott, “Design andcharacterization of a multilevel DRAM,” IEEE Trans. VLSI Syst., vol.19, no. 9, pp. 1583–1596, Sep. 2011.

[3] K. C. Chun, P. Jain, T. Kim, and C. H. Kim, “A 667 MHz logic-com-patible embedded DRAM featuring an asymmetric 2 T gain cell forhigh speed on-die caches,” IEEE J. Solid-State Circuits, vol. 47, no. 2,pp. 547–559, Feb. 2012.

[4] M. Ichihashi, H. Toda, Y. Itoh, and K. Ishibashi, “0.5 V asymmetricthree-Tr. Cell (ATC) DRAM using 90 nm generic CMOS logicprocess,” in Proc. IEEE Symp. VLSI Circuits, 2005, pp. 366–369.

[5] Y. Lee, M. Chen, J. Park, D. Sylvester, and D. Blaauw, “A 5.4 nW/kBretention power logic-compatible embedded DRAM with 2 T dual-Vtgain cell for low power sensing applications,” in Proc. IEEE A-SSCC,2010.

[6] M. Khalid, P. Meinerzhagen, and A. Burg, “Replica bit-line techniquefor embedded multilevel gain-cell DRAM,” in Proc. IEEE NEWCAS,Jun. 2012, pp. 77–80.

[7] J. Lynch, “Memory Architecture With a Current Controller and Re-duced Power Requirements,” U.S. Patent 8 169 812, May 1, 2012.

[8] E. Sanchez-Sinencio and A. Andreou, “A current-based MOSFETmodel for integrated circuit design,” in Low-Voltage/Low-PowerIntegrated Circuits and Systems, 1st ed. Piscataway, NJ, USA: IEEEPress, 1999, ch. 2, sec. 2, pp. 9–21.

[9] P. Kinget, “Device mismatch and tradeoffs in the design of analog cir-cuits,” IEEE J. Solid-State Circuits, vol. 40, no. 6, pp. 1212–1224, Jun.2005.

John Lynch received the B.S. degree in electricalengineering from Purdue University, West Lafayette,IN, USA, in 1982 and the M.S. degree in electricalengineering from National Technological University,Fort Collins, CO, USA, in 1989. Currently he isworking towards the Ph.D. degree in electricalengineering at Purdue University.He has worked in consumer products at Kodak,

mixed signal design at Cadence and LSI, and touchand display systems at National Semiconductor andSynaptics.

Pedro P. Irazoqui (M’93–SM’13) received theB.Sc. and M.Sc. degrees in electrical engineeringfrom the University of New Hampshire, Durham,NH, USA, in 1997 and 1999 respectively, and thePh.D. degree in neuroengineering from the Univer-sity of California at Los Angeles, CA, USA, in 2003for work on the design, manufacture, and packaging,of implantable integrated-circuits for wireless neuralrecording.He is Director of Purdue’s Center for Implantable

Devices, Associate Head for research and AssociateProfessor in the Weldon School of Biomedical Engineering, and Associate Pro-fessor of electrical and computer engineering. His group develops wireless im-plantable devices for various potential applications including monitoring andsuppression of epileptic seizures; prosthesis control for injured military per-sonnel; modulation of cardiac arrhythmias; treatment of depression, and gas-troparesis, a partial paralysis of the stomach; and monitoring and therapeuticmodulation of intraocular pressure for glaucoma.Dr. Irazoqui has been named Showalter Faculty Scholar, and Purdue Uni-

versity Faculty Scholar, both in 2013. He is a senior member of IEEE. Hehas received the Best Teacher Award from the Weldon School of BiomedicalEngineering (2006 & 2009), the Early Career Award from the Wallace H.Coulter Foundation (2007 & Phase II in 2009), the Marion B. Scott Excellencein Teaching Award from Tau Beta Pi (2008), and the Outstanding FacultyMember Award from the Weldon School of Biomedical Engineering (2009),as well as the Excellence in Research Award from Purdue in 2010, 2012 and2013. He has been serving as Associate Editor of IEEE TRANSACTIONS ONBIOMEDICAL ENGINEERING since late 2006.