![Page 1: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/1.jpg)
Error Analysis and Management for MLC NAND Flash Memory
Onur Mutlu [email protected]
(joint work with Yu Cai, Gulay Yalcin, Erich Haratsch, Ken Mai, Adrian Cristal, Osman Unsal)
August 7, 2014 Flash Memory Summit 2014, Santa Clara, CA
![Page 2: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/2.jpg)
Executive Summary
n Problem: MLC NAND flash memory reliability/endurance is a key challenge for satisfying future storage systems’ requirements
n Our Goals: (1) Build reliable error models for NAND flash memory via experimental characterization, (2) Develop efficient techniques to improve reliability and endurance
n This talk provides a “flash” summary of our recent results published in the past 3 years: q Experimental error and threshold voltage characterization [DATE’12&13]
q Retention-aware error management [ICCD’12] q Program interference analysis and read reference V prediction [ICCD’13] q Neighbor-assisted error correction [SIGMETRICS’14]
2
![Page 3: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/3.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
3
![Page 4: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/4.jpg)
Evolution of NAND Flash Memory
n Flash memory is widening its range of applications q Portable consumer devices, laptop PCs and enterprise servers
Seaung Suk Lee, “Emerging Challenges in NAND Flash Technology”, Flash Summit 2011 (Hynix)
CMOS scaling More bits per Cell
4
![Page 5: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/5.jpg)
Flash Challenges: Reliability and Endurance
E. Grochowski et al., “Future technology challenges for NAND flash and HDD products”, Flash Memory Summit 2012
§ P/E cycles (required)
§ P/E cycles (provided)
A few thousand
Writing the full capacity
of the drive 10 times per day
for 5 years (STEC)
> 50k P/E cycles
5
![Page 6: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/6.jpg)
NAND Flash Memory is Increasingly Noisy
Noisy NAND Write Read
6
![Page 7: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/7.jpg)
Future NAND Flash-based Storage Architecture
Memory Signal
Processing
Error Correction
Raw Bit Error Rate
Uncorrectable BER < 10-15 Noisy
High Lower
7
Build reliable error models for NAND flash memory Design efficient reliability mechanisms based on the model
Our Goals:
Better
![Page 8: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/8.jpg)
NAND Flash Error Model
Noisy NAND Write Read
Experimentally characterize and model dominant errors
§ Neighbor page program (c-to-c interference)
§ Retention § Erase block § Program page
Write Read
Cai et al., “Threshold voltage distribution in MLC NAND Flash Memory: Characterization, Analysis, and Modeling”, DATE 2013
Cai et al., “Flash Correct-and-Refresh: Retention-aware error management for increased flash memory lifetime”, ICCD 2012
8
Cai et al., “Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation”, ICCD 2013 Cai et al., “Neighbor-Cell Assisted Error Correction in MLC NAND Flash Memories”, SIGMETRICS 2014
Cai et al., “Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis””, DATE 2012
Cai et al., “Error Analysis and Retention-Aware Error Management for NAND Flash Memory, ITJ 2013
![Page 9: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/9.jpg)
Our Goals and Approach
n Goals: q Understand error mechanisms and develop reliable predictive
models for MLC NAND flash memory errors q Develop efficient error management techniques to mitigate
errors and improve flash reliability and endurance
n Approach: q Solid experimental analyses of errors in real MLC NAND flash
memory à drive the understanding and models q Understanding, models and creativity à drive the new
techniques
9
![Page 10: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/10.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
10
![Page 11: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/11.jpg)
Experimental Testing Platform
11
USB Jack
Virtex-II Pro (USB controller)
Virtex-V FPGA (NAND Controller)
HAPS-52 Mother Board
USB Daughter Board
NAND Daughter Board
3x-nm NAND Flash
[Cai+, FCCM 2011, DATE 2012, ICCD 2012, DATE 2013, ITJ 2013, ICCD 2013, SIGMETRICS 2014]
Cai et al., FPGA-based Solid-State Drive prototyping platform, FCCM 2011.
![Page 12: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/12.jpg)
NAND Flash Usage and Error Model
…
(Page0 - Page128) Program
Page Erase Block
Retention1 (t1 days)
Read Page
Retention j (tj days)
Read Page
P/E cycle 0
P/E cycle i
Start
…
P/E cycle n
…
End of life
Erase Errors Program Errors
Retention Errors Read Errors
Read Errors Retention Errors
12
![Page 13: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/13.jpg)
Methodology: Error and ECC Analysis n Characterized errors and error rates of 3x and 2y-nm MLC
NAND flash using an experimental FPGA-based platform q [Cai+, DATE’12, ICCD’12, DATE’13, ITJ’13, ICCD’13, SIGMETRICS’14]
n Quantified Raw Bit Error Rate (RBER) at a given P/E cycle q Raw Bit Error Rate: Fraction of erroneous bits without any correction
n Quantified error correction capability (and area and power consumption) of various BCH-code implementations q Identified how much RBER each code can tolerate
à how many P/E cycles (flash lifetime) each code can sustain
13
![Page 14: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/14.jpg)
NAND Flash Error Types
n Four types of errors [Cai+, DATE 2012]
n Caused by common flash operations q Read errors q Erase errors q Program (interference) errors
n Caused by flash cell losing charge over time q Retention errors
n Whether an error happens depends on required retention time n Especially problematic in MLC flash because threshold voltage
window to determine stored value is smaller
14
![Page 15: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/15.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
15
![Page 16: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/16.jpg)
retention errors
n Raw bit error rate increases exponentially with P/E cycles n Retention errors are dominant (>99% for 1-year ret. time) n Retention errors increase with retention time requirement
Observations: Flash Error Analysis
16
P/E Cycles
Cai et al., Error Patterns in MLC NAND Flash Memory, DATE 2012.
![Page 17: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/17.jpg)
Retention Error Mechanism LSB/MSB
n Electron loss from the floating gate causes retention errors q Cells with more programmed electrons suffer more from
retention errors q Threshold voltage is more likely to shift by one window than by
multiple
11 10 01 00 Vth
REF1 REF2 REF3
Erased Fully programmed
Stress Induced Leakage Current (SILC)
Floating Gate
17
![Page 18: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/18.jpg)
Retention Error Value Dependency
00 à01 01 à10
n Cells with more programmed electrons tend to suffer more from retention noise (i.e. 00 and 01)
18
![Page 19: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/19.jpg)
More on Flash Error Analysis
n Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai, "Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis" Proceedings of the Design, Automation, and Test in Europe Conference (DATE), Dresden, Germany, March 2012. Slides (ppt)
19
![Page 20: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/20.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
20
![Page 21: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/21.jpg)
Flash Correct-and-Refresh (FCR) n Key Observations:
q Retention errors are the dominant source of errors in flash memory [Cai+ DATE 2012][Tanakamaru+ ISSCC 2011]
à limit flash lifetime as they increase over time q Retention errors can be corrected by “refreshing” each flash
page periodically
n Key Idea: q Periodically read each flash page, q Correct its errors using “weak” ECC, and q Either remap it to a new physical page or reprogram it in-place, q Before the page accumulates more errors than ECC-correctable q Optimization: Adapt refresh rate to endured P/E cycles
21 Cai et al., Flash Correct and Refresh, ICCD 2012.
![Page 22: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/22.jpg)
FCR: Two Key Questions
n How to refresh? q Remap a page to another one q Reprogram a page (in-place) q Hybrid of remap and reprogram
n When to refresh? q Fixed period q Adapt the period to retention error severity
22
![Page 23: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/23.jpg)
n Pro: No remapping needed à no additional erase operations n Con: Increases the occurrence of program errors
In-Place Reprogramming of Flash Cells
23
Retention errors are caused by cell voltage shifting to the left
ISPP moves cell voltage to the right; fixes retention errors
Floating Gate Voltage Distribution
for each Stored Value
Floating Gate
![Page 24: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/24.jpg)
Normalized Flash Memory Lifetime
24
0
20
40
60
80
100
120
140
160
180
200
512b-‐BCH 1k-‐BCH 2k-‐BCH 4k-‐BCH 8k-‐BCH 32k-‐BCH
Normalized
Life
.me
Base (No-‐Refresh) Remapping-‐Based FCR Hybrid FCR AdapDve FCR
46x
Adap.ve-‐rate FCR provides the highest life.me Life.me of FCR much higher than life.me of stronger ECC
4x
![Page 25: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/25.jpg)
Energy Overhead
n Adaptive-rate refresh: <1.8% energy increase until daily
refresh is triggered
25
0%
2%
4%
6%
8%
10%
1 Year 3 Months 3 Weeks 3 Days 1 Day
Ener
gy O
verh
ead
Remapping-based Refresh Hybrid Refresh
7.8%
5.5%
2.6% 1.8%
0.4% 0.3%
Refresh Interval
![Page 26: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/26.jpg)
More Detail and Analysis on FCR
n Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai, "Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime" Proceedings of the 30th IEEE International Conference on Computer Design (ICCD), Montreal, Quebec, Canada, September 2012. Slides (ppt) (pdf)
26
![Page 27: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/27.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
27
![Page 28: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/28.jpg)
Key Questions
n How does threshold voltage (Vth) distribution of different programmed states change over flash lifetime?
n Can we model it accurately and predict the Vth changes?
n Can we build mechanisms that can correct for Vth changes? (thereby reducing read error rates)
28
![Page 29: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/29.jpg)
Threshold Voltage Distribution Model
Gaussian distribution with additive white noise As P/E cycles increase ... n Distribution shifts to the right n Distribution becomes wider
P1 State P2 State P3 State
Characterized on 2Y-nm chips using the read-retry feature
29 Cai et al., Threshold Voltage Distribution in MLC NAND Flash Memory, DATE 2013.
![Page 30: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/30.jpg)
Threshold Voltage Distribution Model
n Vth distribution can be modeled with ~95% accuracy as a Gaussian distribution with additive white noise
n Distortion in Vth over P/E cycles can be modeled and predicted as an exponential function of P/E cycles q With more than 95% accuracy
30
![Page 31: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/31.jpg)
More Detail on Threshold Voltage Model
n Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai, "Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis and Modeling" Proceedings of the Design, Automation, and Test in Europe Conference (DATE), Grenoble, France, March 2013. Slides (ppt)
31
![Page 32: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/32.jpg)
Program Interference Errors
n When a cell is being programmed, voltage level of a neighboring cell changes (unintentionally) due to parasitic capacitance coupling
à can change the data value stored
n Also called program interference error
n Causes neighboring cell voltage to increase (shift right)
n Once retention errors are minimized, these errors can become dominant
32
![Page 33: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/33.jpg)
How Current Flash Cells are Programmed n Programming 2-bit MLC NAND flash memory in two steps
33
Vth
ER (11)
LSB Program
Vth
ER (11)
Temp (0x)
MSB Program
Vth
ER (11)
P1 (10)
P2 (00)
P3 (01)
1
1 1
0
0 0
![Page 34: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/34.jpg)
Basics of Program Interference
Victim Cell
WL<0>
WL<1>
WL<2>
(n,j)
(n+1,j-1) (n+1,j) (n+1,j+1)
LSB:0
LSB:1
MSB:2
LSB:3
MSB:4
MSB:6
(n-1,j-1) (n-1,j) (n-1,j+1)
∆Vx ∆Vx
∆Vy ∆Vxy ∆Vxy
∆Vxy ∆Vxy ∆Vy
34
![Page 35: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/35.jpg)
Traditional Model for Vth Change
n Traditional model for victim cell threshold voltage change
Victim Cell
WL<0>
WL<1>
WL<2>
(n,j)
(n+1,j-1) (n+1,j) (n+1,j+1)
LSB:0
LSB:1
MSB:2
LSB:3
MSB:4
MSB:6
(n-1,j-1) (n-1,j) (n-1,j+1)
∆Vx ∆Vx
∆Vy ∆Vxy ∆Vxy
totalxyxyyyxxvictim CVCVCVCV /)22( Δ+Δ+Δ=Δ
35
Not accurate and requires knowledge of coupling caps!
![Page 36: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/36.jpg)
Our Goal and Idea
n Develop a new, more accurate and easier to implement model for program interference
n Idea: q Empirically characterize and model the effect of neighbor cell
Vth changes on the Vth of the victim cell q Fit neighbor Vth change to a linear regression model and find
the coefficients of the model via empirical measurement
36
∑ ∑+
−=
=
+=
+Δ=ΔKj
Kjy
Mn
nx
beforevictimneighborvictim jnVyxVyxjnV
10 ),(),(),(),( αα
Can be measured
![Page 37: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/37.jpg)
n Feature extraction for Vth changes based on characterization q Threshold voltage changes on aggressor cell q Original state of victim cell
n Enhanced linear regression model
n Maximum likelihood estimation of the model coefficients
Developing a New Model via Empirical Measurement
∑ ∑+
−=
=
+=
+Δ=ΔKj
Kjy
Mn
nx
beforevictimneighborvictim jnVyxVyxjnV
10 ),(),(),(),( αα
εα += XY (vector expression)
)(minarg1
2
2αλα
α+−× YX
37
![Page 38: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/38.jpg)
Effect of Neighbor Voltages on the Victim
n Immediately-above cell interference is dominant n Immediately-diagonal neighbor is the second dominant n Far neighbor cell interference exists n Victim cell’s Vth has negative effect on interference
38 Cai et al., Program Interference in MLC NAND Flash Memory, ICCD 2013
![Page 39: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/39.jpg)
New Model for Program Interference
Victim Cell
WL<0>
WL<1>
WL<2>
(n,j)
(n+1,j-1) (n+1,j) (n+1,j+1)
LSB:0
LSB:1
MSB:2
LSB:3
MSB:4
MSB:6
(n-1,j-1) (n-1,j) (n-1,j+1)
∆Vx ∆Vx
∆Vy ∆Vxy ∆Vxy
∆Vxy ∆Vxy ∆Vy
39
∑ ∑+
−=
+
+=
+Δ=ΔKj
Kjy
Mn
nx
beforevictimneighborvictim jnVyxVyxjnV
10 ),(),(),(),( αα
Cai et al., Program Interference in MLC NAND Flash Memory, ICCD 2013
![Page 40: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/40.jpg)
Model Accuracy
40
Ideal if no interference
(x,y)=(measured before interference, measured after interference)
Ideal if prediction is 100% accurate
(x,y)=(measured before interference, predicted with model)
Interference causes systematic Vth shift
Model corrects for the Vth shift: 96.8% acc.
Characterized on 2Y-nm chips using the read-retry feature
![Page 41: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/41.jpg)
Many Other Results in the Paper
n Yu Cai, Onur Mutlu, Erich F. Haratsch, and Ken Mai, "Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation" Proceedings of the 31st IEEE International Conference on Computer Design (ICCD), Asheville, NC, October 2013. Slides (pptx) (pdf) Lightning Session Slides (pdf)
41
![Page 42: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/42.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
42
![Page 43: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/43.jpg)
Mitigation: Applying the Model
n So, what can we do with the model?
n Goal: Mitigate the effects of program interference caused voltage shifts
43
![Page 44: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/44.jpg)
Optimum Read Reference for Flash Memory n Read reference voltage affects the raw bit error rate
n There exists an optimal read reference voltage q Predictable if the statistics (i.e. mean, variance) of threshold
voltage distributions are characterized and modeled
Vth
f(x) g(x)
v0 v1 vref
∫∫ ∞−
+∞+=
ref
ref
v
vdxxgdxxfBER )()(1 ∫∫ ∞−
+∞+=
ref
ref
v
vdxxgdxxfBER
'
')()(2
Vth
f(x) g(x)
v’ref v0 v1
State-A State-A State-B State-B
44
![Page 45: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/45.jpg)
Optimum Read Reference Voltage Prediction
n Vth shift learning (done every ~1k P/E cycles) q Program sample cells with known data pattern and test Vth q Program aggressor neighbor cells and test victim Vth after interference q Characterize the mean shift in Vth (i.e., program interference noise)
n Optimum read reference voltage prediction q Default read reference voltage + Predicted mean Vth shift by model
After program interference
Vth shift
![Page 46: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/46.jpg)
Effect of Read Reference Voltage Prediction
n Read reference voltage prediction reduces raw BER (by 64%) and increases the P/E cycle lifetime (by 30%)
32k-bit BCH Code (acceptable BER = 2x10-3)
30% lifetime improvement
Raw
bit
erro
r rat
e
No read reference voltage prediction With read reference voltage prediction
![Page 47: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/47.jpg)
More on Read Reference Voltage Prediction
n Yu Cai, Onur Mutlu, Erich F. Haratsch, and Ken Mai, "Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation" Proceedings of the 31st IEEE International Conference on Computer Design (ICCD), Asheville, NC, October 2013. Slides (pptx) (pdf) Lightning Session Slides (pdf)
47
![Page 48: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/48.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
48
![Page 49: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/49.jpg)
Goal
n Develop a better error correction mechanism for cases where ECC fails to correct a page
49
![Page 50: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/50.jpg)
Observations So Far
n Immediate neighbor cell has the most effect on the victim cell when programmed
n A single set of read reference voltages is used to determine the value of the (victim) cell
n The set of read reference voltages is determined based on the overall threshold voltage distribution of all cells in flash memory
50
![Page 51: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/51.jpg)
New Observations [Cai+ SIGMETRICS’14]
n Vth distributions of cells with different-valued immediate-neighbor cells are significantly different q Because neighbor value affects the amount of Vth shift
n Corollary: If we know the value of the immediate-neighbor, we can find a more accurate set of read reference voltages based on the “conditional” threshold voltage distribution
51
Cai et al., Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories, SIGMETRICS 2014.
![Page 52: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/52.jpg)
Secrets of Threshold Voltage Distributions
52
……
…… Victim WL
Aggressor WL 11 10 01 00 01 10 11 00
N11 N11 N00 N00 N10 N10 N01 N01
State P(i) State P(i+1) Victim WL before MSB page of aggressor WL are programmed
State P’(i) State P’(i+1)
Victim WL after MSB page of aggressor WL are programmed
![Page 53: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/53.jpg)
If We Knew the Immediate Neighbor …
n Then, we could choose a different read reference voltage to more accurately read the “victim” cell
53
![Page 54: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/54.jpg)
Overall vs Conditional Reading
n Using the optimum read reference voltage based on the overall distribution leads to more errors
n Better to use the optimum read reference voltage based on the conditional distribution (i.e., value of the neighbor) q Conditional distributions of two states are farther apart from
each other
54
N11 N11 N01 N01
State P’(i) State P’(i+1)
N00 N00 N10 N10 REFx
Vth
![Page 55: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/55.jpg)
Measurement Results
55
P1 State P2 State P3 State
Raw BER of conditional reading is much smaller than overall reading
Large margin
Small margin
![Page 56: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/56.jpg)
Idea: Neighbor Assisted Correction (NAC)
n Read a page with the read reference voltages based on overall Vth distribution (same as today) and buffer it
n If ECC fails: q Read the immediate-neighbor page q Re-read the page using the read reference voltages
corresponding to the voltage distribution assuming a particular immediate-neighbor value
q Replace the buffered values of the cells with that particular immediate-neighbor cell value
q Apply ECC again
56
![Page 57: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/57.jpg)
Neighbor Assisted Correction Flow
n Trigger neighbor-assisted reading only when ECC fails n Read neighbor values and use corresponding read
reference voltages in a prioritized order until ECC passes
57
How to select next local optimum read reference voltage?
![Page 58: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/58.jpg)
Lifetime Extension with NAC
58
ECC needs to correct 40 bits per 1k-Byte
Stage-1 Stage-2 Stage-3
39% 33% 22%
Stage-0
33% life.me improvement at no performance loss
![Page 59: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/59.jpg)
Performance Analysis of NAC
59
No performance loss within nominal lifetime and with reasonable (1%) ECC fail rates
![Page 60: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/60.jpg)
More on Neighbor-Assisted Correction
n Yu Cai, Gulay Yalcin, Onur Mutlu, Eric Haratsch, Osman Unsal, Adrian Cristal, and Ken Mai, "Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories" Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Austin, TX, June 2014. Slides (ppt) (pdf)
60
![Page 61: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/61.jpg)
Agenda n Background, Motivation and Approach n Experimental Characterization Methodology n Error Analysis and Management
q Main Characterization Results q Retention-Aware Error Management q Threshold Voltage and Program Interference Analysis q Read Reference Voltage Prediction q Neighbor-Assisted Error Correction
n Summary
61
![Page 62: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/62.jpg)
Executive Summary
n Problem: MLC NAND flash memory reliability/endurance is a key challenge for satisfying future storage systems’ requirements
n We are: (1) Building reliable error models for NAND flash memory via experimental characterization, (2) Developing efficient techniques to improve reliability and endurance
n This talk provided a “flash” summary of our recent results published in the past 3 years: q Experimental error and threshold voltage characterization [DATE’12&13]
q Retention-aware error management [ICCD’12] q Program interference analysis and read reference V prediction [ICCD’13] q Neighbor-assisted error correction [SIGMETRICS’14]
62
![Page 63: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/63.jpg)
Readings (I) n Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai,
"Error Patterns in MLC NAND Flash Memory: Measurement, Characterization, and Analysis" Proceedings of the Design, Automation, and Test in Europe Conference (DATE), Dresden, Germany, March 2012. Slides (ppt)
n Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken Mai, "Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime" Proceedings of the 30th IEEE International Conference on Computer Design (ICCD), Montreal, Quebec, Canada, September 2012. Slides (ppt) (pdf)
n Yu Cai, Erich F. Haratsch, Onur Mutlu, and Ken Mai, "Threshold Voltage Distribution in MLC NAND Flash Memory: Characterization, Analysis and Modeling" Proceedings of the Design, Automation, and Test in Europe Conference (DATE), Grenoble, France, March 2013. Slides (ppt)
63
![Page 64: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/64.jpg)
Readings (II) n Yu Cai, Gulay Yalcin, Onur Mutlu, Erich F. Haratsch, Adrian Cristal, Osman Unsal, and Ken
Mai, "Error Analysis and Retention-Aware Error Management for NAND Flash Memory" Intel Technology Journal (ITJ) Special Issue on Memory Resiliency, Vol. 17, No. 1, May 2013.
n Yu Cai, Onur Mutlu, Erich F. Haratsch, and Ken Mai, "Program Interference in MLC NAND Flash Memory: Characterization, Modeling, and Mitigation" Proceedings of the 31st IEEE International Conference on Computer Design (ICCD), Asheville, NC, October 2013. Slides (pptx) (pdf) Lightning Session Slides (pdf)
n Yu Cai, Gulay Yalcin, Onur Mutlu, Eric Haratsch, Osman Unsal, Adrian Cristal, and Ken Mai, "Neighbor-Cell Assisted Error Correction for MLC NAND Flash Memories" Proceedings of the ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Austin, TX, June 2014. Slides (ppt) (pdf)
64
![Page 65: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/65.jpg)
Referenced Papers
n All are available at http://users.ece.cmu.edu/~omutlu/projects.htm
65
![Page 66: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/66.jpg)
Related Videos and Course Materials n Computer Architecture Lecture Videos on Youtube
q https://www.youtube.com/playlist?list=PL5PHm2jkkXmidJOd59REog9jDnPDTG6IJ
n Computer Architecture Course Materials q http://www.ece.cmu.edu/~ece447/s13/doku.php?id=schedule
n Advanced Computer Architecture Course Materials q http://www.ece.cmu.edu/~ece740/f13/doku.php?id=schedule
n Advanced Computer Architecture Lecture Videos on Youtube q https://www.youtube.com/playlist?
list=PL5PHm2jkkXmgDN1PLwOY_tGtUlynnyV6D
66
![Page 67: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/67.jpg)
Thank you.
Feel free to email me with any questions & feedback
[email protected] http://users.ece.cmu.edu/~omutlu/
![Page 68: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/68.jpg)
Error Analysis and Management for MLC NAND Flash Memory
Onur Mutlu [email protected]
(joint work with Yu Cai, Gulay Yalcin, Eric Haratsch, Ken Mai, Adrian Cristal, Osman Unsal)
August 7, 2014 Flash Memory Summit 2014, Santa Clara, CA
![Page 69: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/69.jpg)
Additional Slides
![Page 70: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/70.jpg)
Error Types and Testing Methodology n Erase errors
q Count the number of cells that fail to be erased to “11” state
n Program interference errors q Compare the data immediately after page programming and the data
after the whole block being programmed
n Read errors q Continuously read a given block and compare the data between
consecutive read sequences
n Retention errors q Compare the data read after an amount of time to data written
n Characterize short term retention errors under room temperature n Characterize long term retention errors by baking in the oven
under 125℃
70
![Page 71: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/71.jpg)
n Lifetime improvement comparison of various BCH codes
Improving Flash Lifetime with Strong ECC
71
0
2000
4000
6000
8000
10000
12000
14000
512b-BCH 1k-BCH 2k-BCH 4k-BCH 8k-BCH 32k-BCH
P/E
Cyc
le E
ndur
ance
4X Lifetime Improvement
71X Power Consumption 85X Area Consumption
Strong ECC is very inefficient at improving lifeDme
![Page 72: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/72.jpg)
Our Goal
Develop new techniques to improve flash lifetime without relying on stronger ECC
72
![Page 73: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/73.jpg)
FCR Intuition
73
Errors with No refresh
ProgramPage ×
After time T × × ×
After time 2T × × × × ×
After time 3T × × × × × × ×
×
× × ×
× × ×
× × ×
×
×
Errors with Periodic refresh
×
× Retention Error × Program Error
![Page 74: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/74.jpg)
FCR Lifetime Evaluation Takeaways n Significant average lifetime improvement over no refresh
q Adaptive-rate FCR: 46X q Hybrid reprogramming/remapping based FCR: 31X q Remapping based FCR: 9X
n FCR lifetime improvement larger than that of stronger ECC q 46X vs. 4X with 32-kbit ECC (over 512-bit ECC) q FCR is less complex and less costly than stronger ECC
n Lifetime on all workloads improves with Hybrid FCR q Remapping based FCR can degrade lifetime on read-heavy WL q Lifetime improvement highest in write-heavy workloads
74
![Page 75: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/75.jpg)
75
Characterizing Cell Threshold w/ Read Retry
§ Read-retry feature of new NAND flash § Tune read reference voltage and check which Vth region of cells
§ Characterize the threshold voltage distribution of flash cells in programmed states through Monte-Carlo emulation
Vth 11
#cells
10 00 01
REF1 REF2 REF3
0V
Erased State Programmed States
Read Retry
P1 P2 P3
i i-1 i+1 i-2 i+2
01à00
![Page 76: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/76.jpg)
76
§ Parametric distribution § Closed-form formula, only a few number of parameters to be stored
§ Exponential distribution family
§ Maximum likelihood estimation (MLE) to learn parameters
Parametric Distribution Learning
Distribution parameter vector
Likelihood Function
Observed testing data
Goal of MLE: Find distribution parameters to maximize likelihood function
![Page 77: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/77.jpg)
77
Selected Distributions
![Page 78: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/78.jpg)
78
Distribution Exploration
Distribution can be approx. modeled as Gaussian distribution
Beta Gamma Gaussian Log-normal Weibull RMSE 19.5% 20.3% 22.1% 24.8% 28.6%
P1 State P2 State P3 State
![Page 79: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/79.jpg)
79
Cycling Noise Modeling
Mean value (µ) increases with P/E cycles
Standard deviation value (σ) increases with P/E cycles
Exponential model
Linear model
![Page 80: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/80.jpg)
80
Conclusion & Future Work
§ P/E operations modeled as signal passing thru AWGN channel § Approximately Gaussian with 22% distortion § P/E noise is white noise
§ P/E cycling noise affects threshold voltage distributions § Distribution shifts to the right and widens around the mean value § Statistics (mean/variance) can be modeled as exponential correlation with
P/E cycles with 95% accuracy
§ Future work § Characterization and models for retention noise § Characterization and models for program interference noise
![Page 81: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/81.jpg)
Program Interference: Key Findings
n Methodology: Extensive experimentation with real 2Y-nm MLC NAND Flash chips
n Amount of program interference is dependent on q Location of cells (programmed and victim) q Data values of cells (programmed and victim) q Programming order of pages
n Our new model can predict the amount of program interference with 96.8% prediction accuracy
n Our new read reference voltage prediction technique can improve flash lifetime by 30%
81
![Page 82: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/82.jpg)
NAC: Executive Summary n Problem: Cell-to-cell Program interference causes threshold voltage of flash cells to
be distorted even they are originally programmed correctly n Our Goal: Develop techniques to overcome cell-to-cell program interference
q Analyze the threshold voltage distributions of flash cells conditionally upon the values of immediately neighboring cells
q Devise new error correction mechanisms that can take advantage of the values of neighboring cells to reduce error rates over conventional ECC
n Observations: Wide overall distribution can be decoupled into multiple narrower conditional distributions which can be separated easily
n Solution: Neighbor-cell Assisted Correction (NAC) q Re-read a flash memory page that initially failed ECC with a set of read reference voltages
corresponding to the conditional threshold voltage distribution q Use the re-read values to correct the cells that have neighbors with that value q Prioritize reading assuming neighbor cell values that cause largest or smallest cell-to-cell
interference to allow ECC correct errors with less re-reads
n Results: NAC improves flash memory lifetime by 39% q Within nominal lifetime: no performance degradation q In extended lifetime: less than 5% performance degradation
82
![Page 83: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/83.jpg)
Overall vs Conditional Vth Distributions
n Overall distribution: p(x) n Conditional distribution: p(x, z=m)
q m could be 11, 00, 10 and 01 for 2-bit MLC all-bit-line flash
n Overall distribution is the sum of all conditional distributions
83
N11 N11 N00 N00 N10 N10 N01 N01
State P’(i) State P’(i+1)
∑ ===
n
mmzxpxp 2
1),()(
Vth
![Page 84: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/84.jpg)
Prioritized NAC
n Dominant errors are caused by the overlap of lower state interfered by high neighbor interference and the higher state interfered by low neighbor interference
84
P(i)low P(i)
High P(i+1)low P(i+1)
High
P’(i)low P’(i)High P’(i+1)low P’(i+1)
High
REFx REFx11 REFx01
State P(i) State P(i+1)
N11 N00 N10 N01
REFx00 REFx10
N11 N00 N10 N01
State P’(i) State P’(i+1)
![Page 85: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/85.jpg)
Procedure of NAC n Online learning
q Periodically (e.g., every 100 P/E cycles) measure and learn the overall and conditional threshold voltage distribution statistics (e.g. mean, standard deviation and corresponding optimum read reference voltage)
n NAC procedure q Step 1: Once ECC fails reading with overall distribution, load the
failed data and corresponding neighbor LSB/MSB data into NAC q Step 2: Read the failed page with the local optimum read
reference voltage for cells with neighbor programmed as 11 q Step 3: Fix the value for cells with neighbor 11 in step 1 q Step 4: Send fixed data for ECC correction. If succeed, exit.
Otherwise, go to step 2 and try to read with the local optimum read reference voltage 10, 01 and 00 respectively
85
![Page 86: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/86.jpg)
Microarchitecture of NAC (Initialization) ……
……
……
……
……
Neighbor LSB Page Buffer
Neighbor MSB Page Buffer
Page-to-be-Corrected Buffer
Local-Optimum-Read Buffer
Bit1 Bit2
Comparator Vector
Pass Circuit Vector
1 0 1 0 1 0 1 1 0 1
1 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 1
Comp Comp Comp Comp Comp Comp Comp Comp Comp Comp
![Page 87: Error Analysis and Management for MLC NAND Flash Memory · 2014-10-04 · Flash memory is widening its range of applications " Portable consumer devices, laptop PCs and enterprise](https://reader033.vdocuments.net/reader033/viewer/2022042807/5f77af0bc849913ada2c55e6/html5/thumbnails/87.jpg)
NAC (Fixing cells with neighbor 11) ……
……
……
……
……
Neighbor LSB Page Buffer
Neighbor MSB Page Buffer
Page-to-be-Corrected Buffer
Local-Optimum-Read Buffer
Bit1 Bit2
Comparator Vector
Pass Circuit Vector
1 0 1 0 1 0 1 1 0 1
1 0 0 1 0 1 0 1 1 1
0 1 0 0 1 0 1 0 1 1
Comp Comp Comp Comp Comp Comp Comp Comp Comp Comp
1 1
0 1 0 1 1 0 0 1 1 0
ON ON
0 1