Enhancing NAND Flash Memory Lifetime Through Data Retention Optimization

 
Data Retention in MLC NAND
Flash Memory: Characterization,
Optimization, and Recovery
 
Yu Cai, 
Yixin Luo
, Erich F. Haratsch*,
Ken Mai, Onur Mutlu
Carnegie Mellon University, *LSI Corporation
 
1
 
You Probably Know
 
Many use cases:
 
 
 
 
+ High performance, low energy consumption
 
2
NAND Flash Memory Challenges
– Requires erase before program (write)
– High raw bit error rate
3
CPU
Flash
Controller
Raw Flash
Memory
Chips
Limited Flash Memory Lifetime
4
Program/Erase (P/E) Cycles
(or 
Writes Per Cell)
Raw bit error rate (RBER)
 
ECC-correctable RBER
 
Newer generation
 
~3000
 
~2000
Goal: Extend flash memory lifetime
at low cost
P/E Cycle Lifetime
 
Retention Loss
 
5
 
Charge leakage over time
 
One dominant source of flash
memory errors [DATE ‘12, ICCD ‘12]
1
 
0
 
0
 
Retention
error
 
Flash cell
 
NAND Flash 101
 
6
 
Before I show you
how we extend flash lifetime …
 
 
Threshold Voltage (V
th
)
 
7
 
Normalized V
th
 
Flash cell
 
Flash cell
 
Threshold Voltage (V
th
) Distribution
 
8
 
Normalized V
th
 
Probability Density
Function (PDF)
 
Read Reference Voltage (V
ref
)
 
9
 
Normalized V
th
 
PDF
 
V
ref
Multi-Level Cell (MLC)
10
Normalized V
th
 
Erased
(11)
P1
(10)
P2
(00)
P3
(01)
PDF
 
ER-P1 V
ref
P1-P2 V
ref
P2-P3 V
ref
11
Normalized V
th
PDF
P1
(10)
P2
(00)
P3
(01)
Before retention loss:
After some retention loss:
Threshold Voltage Reduces Over Time
Fixed Read Reference Voltage Becomes Suboptimal
12
Normalized V
th
 
P1-P2 V
ref
P2-P3 V
ref
Normalized V
th
PDF
P1
(10)
P2
(00)
P3
(01)
 
Raw bit errors
Before retention loss:
After some retention loss:
 
Optimal Read Reference Voltage (OPT)
 
13
 
Normalized V
th
 
PDF
 
P1
(10)
 
P2
(00)
 
P3
(01)
 
P1-P2 V
ref
 
P2-P3 V
ref
 
P1-P2 OPT
 
P2-P3 OPT
 
Minimal raw bit errors
 
After some retention loss:
 
14
Goal 1:
 Design a low-cost mechanism that
dynamically 
finds the optimal read reference
voltage
Correctable errors
Retention Failure
15
Normalized V
th
PDF
P1
(10)
P2
(00)
P3
(01)
P1-P2 V
ref
P2-P3 V
ref
Uncorrectable errors
After some retention loss:
After 
significant 
retention loss:
 
16
Goal 1:
 Design a low-cost mechanism that
dynamically 
finds the optimal read reference
voltage
Goal 2:
 Design an offline mechanism to 
recover
data after detecting uncorrectable errors
 
17
To understand the effects of retention loss:
 - 
Characterize retention loss 
using real chips
 
18
Goal 2:
 Design an offline mechanism to recover
data after detecting uncorrectable errors
To understand the effects of retention loss:
 - 
Characterize retention loss
 using real chips
Goal 1:
 Design a low-cost mechanism that
dynamically finds the optimal read reference
voltage
 
Characterization Methodology
 
19
 
FPGA-based flash memory testing platform [Cai+,FCCM ‘11]
 
Characterization Methodology
 
FPGA-based flash memory testing platform
R
e
a
l
 
2
0
-
 
t
o
 
2
4
-
n
m
 
M
L
C
 
N
A
N
D
 
f
l
a
s
h
 
c
h
i
p
s
0- to 40-day worth of retention loss
Room temperature (20⁰C)
0 to 50k P/E Cycles
 
20
 
21
Characterize the effects of retention loss
1. Threshold Voltage Distribution
2. Optimal Read Reference Voltage
3. RBER and P/E Cycle Lifetime
 
1. Threshold Voltage (V
th
) Distribution
 
22
 
Normalized V
th
 
PDF
 
P1
 
P2
 
P3
 
1. Threshold Voltage (V
th
) Distribution
 
23
Finding: Cell’s threshold voltage decreases over time
 
P1
 
P2
 
P3
 
0-day
 
40-day
 
0-day
 
40-day
 
2. Optimal Read Reference Voltage (OPT)
 
24
 
P1
 
P2
 
P3
Finding: OPT decreases over time
 
0-day
OPT
 
40-day
OPT
 
0-day
OPT
 
40-day
OPT
 
3. RBER and P/E Cycle Lifetime
 
25
 
P/E Cycles
 
RBER
Actual OPT
Reading data with 7-day worth of retention loss.
3. RBER and P/E Cycle Lifetime
26
 
ECC-correctable RBER
Finding: Using actual OPT achieves the longest lifetime
V
ref
 closer to
actual OPT
 
Nominal
Lifetime
 
Extended
Lifetime
 
Characterization Summary
 
D
u
e
 
t
o
 
r
e
t
e
n
t
i
o
n
 
l
o
s
s
C
e
l
l
s
 
t
h
r
e
s
h
o
l
d
 
v
o
l
t
a
g
e
 
(
V
t
h
)
 
d
e
c
r
e
a
s
e
s
 
o
v
e
r
 
t
i
m
e
O
p
t
i
m
a
l
 
r
e
a
d
 
r
e
f
e
r
e
n
c
e
 
v
o
l
t
a
g
e
 
(
O
P
T
)
 
d
e
c
r
e
a
s
e
s
o
v
e
r
 
t
i
m
e
 
U
s
i
n
g
 
t
h
e
 
a
c
t
u
a
l
 
O
P
T
 
f
o
r
 
r
e
a
d
i
n
g
A
c
h
i
e
v
e
s
 
t
h
e
 
l
o
n
g
e
s
t
 
l
i
f
e
t
i
m
e
 
27
 
28
Goal 2:
 Design an offline mechanism to recover
data after detecting uncorrectable errors
To understand the effects of retention loss:
 - Characterize retention loss using real chips
Goal 1:
 Design a low-cost mechanism that
dynamically 
finds the optimal read reference
voltage
Na
ï
ve Solution: Sweeping V
ref
 
Key idea:
 Read the data multiple times with
different read reference voltages until the raw
bit errors are correctable by ECC
 
Finds the optimal read reference voltage
 
Requires many read-retries 
  
29
Comparison of Flash Read Techniques
30
 
1. The optimal read reference voltage gradually
decreases over time
Key idea:
 Record the old OPT as a prediction (V
pred
) of
the actual OPT
Benefit:
 Close to actual OPT 
 
 read
retries
 
2. The amount of retention loss is similar across pages
within a flash block
Key idea:
 Record only one V
pred
 for each block
Benefit:
 Small storage overhead (768KB out of 512GB)
Observations
31
 
Retention Optimized Reading (ROR)
 
Components:
1. Online pre-optimization algorithm
Periodically records a V
pred
 for each block
 
2. Improved read-retry technique
Utilizes the recorded V
pred
 to minimize read-retry
count
 
32
1. Online Pre-Optimization Algorithm
 
Triggered periodically (e.g., per day)
Find and record an OPT as per-block V
pred
Performed in background
Small storage overhead
33
 
2. Improved Read-Retry Technique
 
Performed as normal read
V
pred
 already close to actual OPT
Decrease V
ref
 if V
pred
 fails, and retry
 
34
 
Normalized V
th
 
PDF
 
OPT
 
V
pred
 
Very close
 
Retention Optimized Reading: Summary
 
35
 
Nom. Life: 2.4% ↓
 
Ext. Life:   70.4% ↓
 
36
Goal 2:
 Design an offline mechanism to 
recover
data after detecting uncorrectable errors
To understand the effects of retention loss:
 - Characterize retention loss using real chips
Goal 1:
 Design a low-cost mechanism that
dynamically finds the optimal read reference
voltage
Correctable errors
Retention Failure
37
Normalized V
th
PDF
P1
(10)
P2
(00)
P3
(01)
P1-P2 V
ref
P2-P3 V
ref
Uncorrectable errors
After some retention loss:
After 
significant 
retention loss:
Leakage Speed Variation
38
Normalized V
th
PDF
S
F
low-leaking cell
ast-leaking cell
S
F
 
Initially, Right After Programming
 
39
 
Normalized V
th
 
PDF
S
F
S
F
S
F
S
F
F
F
F
F
After Some Retention Loss
40
Normalized V
th
PDF
S
F
S
F
S
F
S
F
Fast-leaking cells have lower V
th
Slow-leaking cells have higher V
th
 
Eventually: Retention Failure
 
41
 
Normalized V
th
 
PDF
S
F
S
F
S
F
S
F
 
OPT
 
Retention Failure Recovery (RFR)
 
Key idea:
 Guess original state of the cell from
its leakage speed property
 
Three steps
1.
Identify risky cells
2.
Identify fast-/slow-leaking cells
3.
Guess original states
 
42
1. Identify Risky Cells
43
Normalized V
th
PDF
S
S
F
F
OPT+
σ
OPT
OPT–
σ
Risky
cells
 
+ S =
 
+ F =
Key Formula
2. Identifying Fast- vs. Slow-Leaking Cells
44
Normalized V
th
PDF
OPT+
σ
OPT
OPT–
σ
Risky
cells
+ S =
+ F =
Key Formula
?
?
?
?
?
?
 
2. Identifying Fast- vs. Slow-Leaking Cells
 
45
 
Normalized V
th
 
PDF
 
OPT+
σ
 
OPT
 
OPT–
σ
 
Risky
cells
 
+ S =
 
+ F =
Key Formula
?
?
?
?
S
F
F
S
?
?
 
3. Guess Original States
 
46
 
Normalized V
th
 
PDF
S
F
F
S
 
Risky
cells
 
+ S =
 
+ F =
Key Formula
 
RFR Evaluation
 
Expect to eliminate
50% of raw bit errors
ECC can correct
remaining errors
 
47
 
Program with
random data
 
Detect failure,
backup data
 
Recover data
 
28 days
 
12 addt’l.
days
 
48
Goal 2:
 Design an offline mechanism to recover
data after detecting uncorrectable errors
To understand the effects of retention loss:
 - Characterize retention loss using real chips
Goal 1:
 Design a low-cost mechanism that
dynamically finds the optimal read reference
voltage
Conclusion
 
Problem:
 Retention loss reduces flash lifetime
Overall Goal:
 Extend flash lifetime at low cost
Flash Characterization:
 Developed an 
understanding
of the effects of 
retention loss 
in real chips
Retention Optimized Reading:
 A low-cost mechanism
that 
dynamically finds the optimal read reference
voltage
64% lifetime
, 70.4% read latency 
Retention Failure Recovery:
 An offline mechanism
that
 recovers data after detecting uncorrectable
errors
Raw bit error rate 50%
, reduces data loss
49
 
Data Retention in MLC NAND
Flash Memory: Characterization,
Optimization, and Recovery
 
Yu Cai, 
Yixin Luo
, Erich F. Haratsch*,
Ken Mai, Onur Mutlu
Carnegie Mellon University, *LSI Corporation
 
50
 
Backup Slides
 
 
51
 
RFR Motivation
 
Data loss can happen in many ways
1.
High P/E cycle
2.
High temperature 

 

  
   
 
 
52
 
What if there are other errors?
 
Key:
 RFR does not have to correct all errors
 
Example:
ECC can correct 40 errors in a page
Corrupted page has 20 retention errors, 25
other errors (45 total errors)
After RFR: 10 retention errors, 30 other errors
(40 total errors 
 

)
 
53
Slide Note

Thank you for your introduction.

Good afternoon. (My name is Yixin Luo.) Today, I will present our paper “Data retention in MLC NAND flash memory: characterization, optimization, and recovery”.

This work is done with my collaborators from Carnegie Mellon University and LSI Corporation.

Embed
Share

Characterization, optimization, and recovery methods are explored to extend the lifespan of MLC NAND flash memory. Challenges such as high raw bit error rates and charge leakage leading to retention loss are addressed. Techniques like ECC and threshold voltage adjustment are employed to improve memory endurance and reduce errors over time.

  • Flash memory
  • Data retention
  • Optimization
  • ECC
  • Lifetime extension

Uploaded on Aug 14, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 1

  2. You Probably Know Many use cases: + High performance, low energy consumption 2

  3. NAND Flash Memory Challenges Requires erase before program (write) High raw bit error rate Controller Raw Flash Memory Chips Flash CPU ECC Controller 3

  4. Limited Flash Memory Lifetime Goal: Extend flash memory lifetime at low cost Raw bit error rate (RBER) P/E Cycle Lifetime ECC-correctable RBER ~2000 ~3000 Program/Erase (P/E) Cycles (or Writes Per Cell) 4

  5. Retention Loss Charge leakage over time 0 0 1 Retention error Flash cell One dominant source of flash memory errors [DATE 12, ICCD 12] 5

  6. Before I show you how we extend flash lifetime NAND Flash 101 6

  7. Threshold Voltage (Vth) Flash cell Flash cell 0 1 Normalized Vth 7

  8. Threshold Voltage (Vth) Distribution Probability Density Function (PDF) 0 1 Normalized Vth 8

  9. Read Reference Voltage (Vref) PDF Vref 0 1 Normalized Vth 9

  10. Multi-Level Cell (MLC) ER-P1 Vref P1-P2 Vref P2-P3 Vref PDF Erased (11) P1 (10) P2 (00) P3 (01) Normalized Vth 10

  11. Threshold Voltage Reduces Over Time Before retention loss: After some retention loss: PDF P1 (10) P2 (00) P3 (01) Normalized Vth 11

  12. Fixed Read Reference Voltage Becomes Suboptimal Before retention loss: After some retention loss: P2-P3 Vref P1-P2 Vref PDF P1 (10) P2 (00) P3 (01) Normalized Vth Normalized Vth Raw bit errors 12

  13. Optimal Read Reference Voltage (OPT) After some retention loss: P1-P2 OPT P2-P3 OPT P2-P3 Vref P1-P2 Vref PDF P1 (10) P2 (00) P3 (01) Normalized Vth Minimal raw bit errors 13

  14. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage 14

  15. Retention Failure After some retention loss: After significant retention loss: P2-P3 Vref P1-P2 Vref PDF P1 (10) P2 (00) P3 (01) Normalized Vth Correctable errors Uncorrectable errors 15

  16. Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 16

  17. To understand the effects of retention loss: - Characterize retention loss using real chips 17

  18. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 18

  19. Characterization Methodology FPGA-based flash memory testing platform [Cai+,FCCM 11] 19

  20. Characterization Methodology FPGA-based flash memory testing platform Real Real 20- to 24-nm MLC NAND flash chips 0- to 40-day worth of retention loss Room temperature (20 C) 0 to 50k P/E Cycles 20

  21. Characterize the effects of retention loss 1. Threshold Voltage Distribution 2. Optimal Read Reference Voltage 3. RBER and P/E Cycle Lifetime 21

  22. 1. Threshold Voltage (Vth) Distribution PDF P1 P2 P3 Normalized Vth 22

  23. 1. Threshold Voltage (Vth) Distribution 0-day 0-day 40-day 40-day P1 P2 P3 Finding: Cell s threshold voltage decreases over time 23

  24. 2. Optimal Read Reference Voltage (OPT) 40-day OPT 0-day OPT 40-day OPT 0-day OPT P1 P2 P3 Finding: OPT decreases over time 24

  25. 3. RBER and P/E Cycle Lifetime RBER P/E Cycles 25

  26. 3. RBER and P/E Cycle Lifetime Vref closer to actual OPT Reading data with 7-day worth of retention loss. Extended Nominal Lifetime Lifetime Actual OPT ECC-correctable RBER Finding: Using actual OPT achieves the longest lifetime 26

  27. Characterization Summary Due to retention loss retention loss Cell s threshold voltage Cell s threshold voltage (Vth) decreases over time Optimal read reference voltage Optimal read reference voltage (OPT) decreases over time Using the actual OPT actual OPT for reading Achieves the longest lifetime lifetime 27

  28. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 28

  29. Nave Solution: Sweeping Vref Key idea: Read the data multiple times with different read reference voltages until the raw bit errors are correctable by ECC Finds the optimal read reference voltage Requires many read-retries higher read latency 29

  30. Comparison of Flash Read Techniques Flash Read Techniques Lifetime (P/E Cycle) Performance (Read Latency) Fixed Vref Sweeping Vref Our Goal 30

  31. Observations 1. The optimal read reference voltage gradually decreases over time Key idea: Record the old OPT as a prediction (Vpred) of the actual OPT Benefit: Close to actual OPT Fewer read retries 2. The amount of retention loss is similar across pages within a flash block Key idea: Record only one Vpred for each block Benefit: Small storage overhead (768KB out of 512GB) 31

  32. Retention Optimized Reading (ROR) Components: 1. Online pre-optimization algorithm Periodically records a Vpred for each block 2. Improved read-retry technique Utilizes the recorded Vpred to minimize read-retry count 32

  33. 1. Online Pre-Optimization Algorithm Triggered periodically (e.g., per day) Find and record an OPT as per-block Vpred Performed in background Small storage overhead New Vpred Old Vpred PDF Normalized Vth 33

  34. 2. Improved Read-Retry Technique Performed as normal read Vpred already close to actual OPT Decrease Vref if Vpred fails, and retry PDF OPT Vpred Normalized Vth Very close 34

  35. Retention Optimized Reading: Summary Flash Read Techniques Lifetime (P/E Cycle) 64% 64% Performance (Read Latency) _____ Ext. Life: 70.4% Fixed Vref Sweeping Vref Nom. Life: 2.4% ROR 35

  36. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 36

  37. Retention Failure After some retention loss: After significant retention loss: P2-P3 Vref P1-P2 Vref PDF P1 (10) P2 (00) P3 (01) Normalized Vth Correctable errors Uncorrectable errors 37

  38. Leakage Speed Variation PDF low-leaking cell S S ast-leaking cell F F Normalized Vth 38

  39. Initially, Right After Programming PDF P2 P3 S S F F F F S S Normalized Vth 39

  40. After Some Retention Loss Fast-leaking cells have lower Vth PDF Slow-leaking cells have higher Vth P2 P3 S S F F F F F F F F S S Normalized Vth 40

  41. Eventually: Retention Failure PDF OPT P2 P3 S S F F F F S S Normalized Vth 41

  42. Retention Failure Recovery (RFR) Key idea: Guess original state of the cell from its leakage speed property Three steps 1. Identify risky cells 2. Identify fast-/slow-leaking cells 3. Guess original states 42

  43. 1. Identify Risky Cells OPT+ OPT + S = P2 Risky cells PDF OPT + F = Key Formula P3 S F F S Normalized Vth 43

  44. 2. Identifying Fast- vs. Slow-Leaking Cells OPT+ OPT + S = P2 Risky cells PDF OPT + F = Key Formula P3 ? ? ? ? ? ? Normalized Vth 44

  45. 2. Identifying Fast- vs. Slow-Leaking Cells OPT+ OPT + S = P2 Risky cells PDF OPT + F = Key Formula P3 ? S ? ? F ? ? F ? S Normalized Vth 45

  46. 3. Guess Original States + S = P2 Risky cells PDF + F = Key Formula P3 S F F S Normalized Vth 46

  47. RFR Evaluation Expect to eliminate 50% of raw bit errors ECC can correct remaining errors Program with random data 28 days Detect failure, backup data 12 addt l. days Recover data 47

  48. To understand the effects of retention loss: - Characterize retention loss using real chips Goal 1: Design a low-cost mechanism that dynamically finds the optimal read reference voltage Goal 2: Design an offline mechanism to recover data after detecting uncorrectable errors 48

  49. Conclusion Problem: Retention loss reduces flash lifetime Overall Goal: Extend flash lifetime at low cost Flash Characterization: Developed an understanding of the effects of retention loss in real chips Retention Optimized Reading: A low-cost mechanism that dynamically finds the optimal read reference voltage 64% lifetime , 70.4% read latency Retention Failure Recovery: An offline mechanism that recovers data after detecting uncorrectable errors Raw bit error rate 50% , reduces data loss 49

  50. Data Retention in MLC NAND Flash Memory: Characterization, Optimization, and Recovery Yu Cai, Yixin Luo, Erich F. Haratsch*, Ken Mai, Onur Mutlu Carnegie Mellon University, *LSI Corporation 50

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#