sangyeun cho hyunjin lee

48
University of Pittsburg Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance Sangyeun Cho Hyunjin Lee Dept. of Computer Science University of Pittsburgh

Upload: murray

Post on 23-Feb-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Performance, Energy and Endurance. Sangyeun Cho Hyunjin Lee. Dept. of Computer Science University of Pittsburgh. Phase-change RAM (PRAM). bit line. V CC. current pulse source. top electrode. memory. GST. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sangyeun  Cho      Hyunjin Lee

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Per-

formance, Energy and Endurance

Sangyeun Cho Hyunjin Lee

Dept. of Computer ScienceUniversity of Pittsburgh

Page 2: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Phase-change RAM (PRAM)

topelectrode

GST

metal

bit line

select TR

currentpulsesource

memory

VCC

Vgate

(Pictures from Hegedüs and Elliott, Nature Materials, March 2008)

Amorphous = high resistivity Crystalline = low resistivity

SET

RESET

Page 3: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

PRAM asymmetries

read latency << write latency

Page 4: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

PRAM asymmetries

read power << write power

Page 5: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

PRAM asymmetries

read endurance >> write endurance

Page 6: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Writes are Bad…

… and are Hated!

Page 7: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Partial writes [Lee et al. ‘09]

cache block replaced and written to PRAM

1

0 1introduce multiple dirty bits to isolate and not write clean data

4B granularity gives a 6x improvement (over 64B)

Page 8: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Differential write [Yang et al. ‘07][Zhou et al. ‘09]

0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1 cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block

Page 9: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Differential write [Yang et al. ‘07][Zhou et al. ‘09]

cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block

READ out (old) memory content1

0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1

Page 10: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0

Differential write [Yang et al. ‘07][Zhou et al. ‘09]

cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1

Compare old and new data bit by bit2

1 1 1 1 0 1 1 0

0 0 10 0

Page 11: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Differential write [Yang et al. ‘07][Zhou et al. ‘09]

cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

0 0 0 1 0 1 1 0 0 1 0 1 0 0 0 1

Update bits that differ3

0 0 10 0

Probabilistically, 50% of bits will be updated;In practice, (typically) fewer bits are updated as

bit-level redundancies are common in data

0 1 0 1 0 0 0 1 0 0 10 0

Page 12: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Contributions of this work We propose Flip-N-Write, a differential write scheme based

on Read-Modify-Write and data encoding• We use a simple bi-modal data encoding strategy: Intact or flipped• Flip bit is introduced to denote the mode

Importantly, Flip-N-Write will update at most N/2 bits at a time when updating N bits• c.f., Differential write updates at most N bits• Write power is deterministically bounded

We perform a comprehensive comparative study• Conventional write scheme• Differential write• Flip-N-Write

Page 13: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Flip-N-Write basic idea

cache block replaced to be written to PRAM

1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”

“Old data”

Page 14: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Flip-N-Write basic idea

cache block replaced to be written to PRAM

1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”

“Old data”

11 bits are different!

Page 15: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Flip-N-Write basic idea

cache block replaced to be written to PRAM

1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”

“Old data”

Only five bits are different!

0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 “Flippednew data”

Page 16: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Flip-N-Write basic idea

cache block replaced to be written to PRAM

1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 “New data”

“Old data”

(5+1) bits need be updated…

0 0 0 0 0 0 0 0 1 1 1 0 1 1 1 0 “Flippednew data” 1

0

“Flip bit”

Page 17: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Flip-N-Write steps

1 1 1 1 1 1 1 10 0 0 1 0 0 0 1cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block

Page 18: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

Flip-N-Write steps

cache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 content of the corresponding memory block

READ out (old) memory content1

1 1 1 1 1 1 1 10 0 0 1 0 0 0 1

Page 19: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0

Flip-N-Write stepscache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

1 1 1 1 1 1 1 10 0 0 1 0 0 0 1

Compare old and new data and compute hamming distance2

1 1 1 1 0 1 1 0

Page 20: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0

Flip-N-Write stepscache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

1 1 1 1 1 1 1 10 0 0 1 0 0 0 1

If hamming distance > N/2, flip new data2

1 1 1 1 0 1 1 0

Page 21: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

0 0 0 1 0 1 1 0

Flip-N-Write stepscache block replaced to be written to PRAM

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0

0 0 0 0 0 0 0 01 1 1 0 1 1 1 0

If hamming distance > N/2, flip new data2

1 1 1 1 0 1 1 0

0 0 0 10

1 1 1 01

Page 22: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Flip-N-Write stepscache block replaced to be written to PRAM

Update bits that differ3

0 0 10 0

At most N/2 bits are updated;In practice, (typically) fewer bits are updated as

bit-level redundancies are common in data

0 0 0 0 0 0 0 01 1 1 0 1 1 1 00 0 0 10 0 0 0 0 0 0 01 1 1 0 1 1 1 00 0 0 100

0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 0 0 0 0 10

Page 23: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Analysis: bit update reduction

2 4 8 16 320%

10%

20%

30%

40%

50%

60%

70%

word width

improvementover conventional

Page 24: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Analysis: bit update reduction

2 4 8 16 320%

10%

20%

30%

40%

50%

60%

70%

word width

improvementover DW

improvementover conventional

Page 25: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Analysis: flip bit overheads

2 4 8 16 320%

10%

20%

30%

40%

50%

60%

70%

word width

improvementover conventional

improvementover DW

overhead

Page 26: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Analysis: comparison with DW

DW

# bi

t upd

ates

/16-

bit w

rite

in-position bit flip probability

0 0.2 0.4 0.600000000000001 0.8 10

4

8

12

16

Page 27: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Analysis: comparison with DW

DW

# bi

t upd

ates

/16-

bit w

rite

in-position bit flip probability

0 0.2 0.4 0.600000000000001 0.8 10

4

8

12

16

Flip-N-Write

Page 28: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Implications Savings in bit updates improves power and endurance

On average, in terms of # update bits, Flip-N-Write is better than, but not far superior to DW

However, in terms of maximum # update bits, Flip-N-Write has an edge: N/2 vs. N• This deterministic property is extremely useful with a limited cell write

current requirement

Write current limited write time (M bits, S-bit update rate):• Conventional: (M/S)×TSET

• DW: TREAD + (M/S)×TSET

• Flip-N-Write: TREAD + (M/2S)×TSET

Page 29: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Circuit design (16): read path

(Baseline design from Lee et al., ISSCC, February 2007)

cell blocks have more bits (flip bits)

larger read buffer (flip bits)

logic for bypassing/flipping data

Page 30: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Circuit design (16): write prep.

(Baseline design from Lee et al., ISSCC, February 2007)

determine ifflipping is needed

bit mask showingwhat bits need programming bit-wise SET/RESET commands

Page 31: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Circuit design (16): write driver

(Baseline design from Lee et al., ISSCC, February 2007)

SET operation suppressedif outcome is “0” (not needed) RESET operation suppressed

if outcome is “0”

write (SET/RESET) pulseCell current

Page 32: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Experimental setup We have three sets of experiments

• Storage environment: # of bit updates from firmware/data update• Main memory environment: Program execution time

Four-core processor chip with ATOM-like cores PRAMs with conventional write, DW, and Flip-N-Write

• Based on Samsung’s 512Mbit prototype chip Workloads

• Firmware update: MiBench compiled with gcc• Data update: photos from New York Times “Pictures of the Day” (late April of

2009), music files “ripped” from two albums

• Program run: SPEC2006 (multiprogrammed), SPLASH-2, SPECjbb

Page 33: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (ARM)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0

128

256

384

512

640

768

896

1,024

SET

RESET

# bi

t upd

ates

/1,0

24 c

ode

bits

Page 34: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (ARM)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0

128

256

384

512

640

768

896

1,024

SET

RESET

# bi

t upd

ates

/1,0

24 c

ode

bits

conventionalDWFlip-N-Write

Page 35: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (ARM)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0

128

256

384

512

640

768

896

1,024

# bi

t upd

ates

/1,0

24 c

ode

bits

Many more “SET” operations than “RESET”

Page 36: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (ARM)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0

128

256

384

512

640

768

896

1,024

# bi

t upd

ates

/1,0

24 c

ode

bits

For DW & Flip-N-Write, # SET ~ # RESET

Page 37: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (ARM)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0

128

256

384

512

640

768

896

1,024

# bi

t upd

ates

/1,0

24 c

ode

bits

305.7278.6

Conventional > DW > Flip-N-Write

Page 38: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Firmware update (x86)

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

basicmath typeset stringsearch

patricia pgp Average

0128256384512640768896

1,024

385.0331.3

# bi

t upd

ates

/1,0

24 c

ode

bits

Improvement w/ Flip-N-Write over DW ~14%

Page 39: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Music file update

(a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c) (a)

(b)

(c)

Krall-Medium Stravinsky-Medium

Krall-High Stravinsky-High

0

128

256

384

512

640

768

896

1,024

# bi

t upd

ates

/1,0

24 d

ata

bits

Compressed files have roughly equal 0’s and 1’s

Page 40: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Main memory performance

LLMM LLHH MMHH cholesky lu ocean SPECjbb0

400

800

1,200

AM

AL

(cyc

les)

conv

entio

nal

DW

Flip

-N-W

rite

NoD

elay

Page 41: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Main memory performance

LLMM LLHH MMHH cholesky lu ocean SPECjbb0

400

800

1,200

AM

AL

(cyc

les)

Higher memory intensity leads to larger AMAL

Page 42: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Main memory performance

LLMM LLHH MMHH cholesky lu ocean SPECjbb0

400

800

1,200

AM

AL

(cyc

les)

Flip-N-Write eases bandwidth shortage problem

Page 43: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Program performance

LLMM LLHH MMHH cholesky lu ocean SPECjbb0

0.2

0.4

0.6

0.8

1

rela

tive

perf

orm

ance

(to

NoD

elay

)

PRAM write bandwidth remains a challenge…

Page 44: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Program performance

LLMM LLHH MMHH cholesky lu ocean SPECjbb0

0.2

0.4

0.6

0.8

1

rela

tive

perf

orm

ance

(to

NoD

elay

)

Flip-N-Write salvages performance…

Page 45: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Bandwidth, bandwidth, bandwidth

16 banks 32 banks 64 banks 128 banks

0

200

400

600

800

1,000

1,200

1,400

400ns 200ns 100ns 50ns0

200

400

600

800

1,000

1,200

1,400

2MB 4MB 8MB 16MB0

200

400

600

800

1,000

1,200

1,400

AM

AL

(cyc

les)

conventional

DW

Flip-N-Write

NoDelay

Write bandwidth is the key to high performance

# banks/device L2 cache size PRAM update time

Page 46: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Conclusions Flip-N-Write is a differential write scheme to reduce PRAM’s

bit update actions using simple encoding• Power/energy reduction• Potential improvement in write endurance

Simple algorithm allows straightforward design Deterministic bound on # update bits beneficial for power

provisioning or bandwidth improvement• Faster firmware/data update latency• Performance improvement can be sizable

Write bandwidth remains a challenge

Page 47: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Flip-N-Write: A Simple Deterministic Technique to Improve PRAM Write Per-

formance, Energy and Endurance

Sangyeun Cho Hyunjin Lee

Dept. of Computer ScienceUniversity of Pittsburgh

Page 48: Sangyeun  Cho      Hyunjin Lee

University of Pittsburgh

Flip-N-Write, more formally