accord: associativity for dram caches by coordinating way … · 2018-06-20 · accord:...

37
ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson Young (GT) Chiachen Chou (GT) Aamer Jaleel (NVIDIA) Moinuddin K. Qureshi (GT) Authors:

Upload: others

Post on 30-Jun-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD:AssociativityforDRAMCachesbyCoordinatingWay-InstallandWay-Prediction

ISCA2018

1

Vinson Young (GT)Chiachen Chou (GT)Aamer Jaleel (NVIDIA)Moinuddin K. Qureshi (GT)

Authors:

Page 2: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

4-8x Bandwidth(of traditional memory)

3D-DRAM MITIGATES BANDWIDTH WALL

2

Hybrid Memory Cube (HMC) from Micron, High Bandwidth Memory (HBM) from Samsung

3D-Stacked DRAM

Limited Capacity✘

Memory3D-DRAM + High-Capacity Memory = Hybrid Memory

Modernsystempackingmanycoresè BandwidthWall

Page 3: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

OS-visible Space

System Memory(NVM / DRAM)

USE 3D-DRAM AS A CACHE

3

DRAM-Cache(3D-DRAM)

Mem

ory

Hie

rarc

hy

fast

slow

CPUL1$L2$

L3$

CPU

L2$L1$

Using 3D-DRAM as a DRAM cache, can improve memory bandwidth (and avoid OS/software change)

MCDRAM from Intel

Page 4: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

Organize at line granularity (64B) for capacity/BW utilization

Gigascale cache needs large tag-store (tens of MBs)

3D-DRAM

ARCHITECTING LARGE DRAM CACHES

4

4GB Data128 MBTags

Tags?Too large for SRAM

Page 5: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

Organize at line granularity (64B) for high cache utilization

Gigascale cache needs large tag-store (tens of MBs)

Practical designs must store Tags in DRAM

3D-DRAM

ARCHITECTING LARGE DRAM CACHES

5

How to architect tag-store for low-latency tag access?

4GB Data128 MBTags

Page 6: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

EFFICIENT TAG ORGANIZATION (KNL CACHE)

6

Practical designs are 64B line-size, store Tag-With-Data, and are direct-mapped, to optimize for hit-latency.

Tag Data Tag Data Tag Data Tag Data

Tag-With-Data [Alloy Cache, Intel Knights Landing]

Single Tag+Data Lookup (1x hit latency),but direct-mapped

Intel Knights Landing Product (MCDRAM) uses this DRAM-cache organization.

Page 7: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

60

70

80

90

1-way

2-way

4-way

8-way

(a) Hit Rate

Hit

Ra

te (

%)

0

0.5

1

1.5

2-way

4-way

8-way

(b) Speedup (Parallel)

Sp

ee

du

p (

Pa

ralle

l) 0

0.5

1

1.5

2-way

4-way

8-way

(c) Speedup (Idealized)

Sp

ee

du

p (

Ide

aliz

ed

)Reduce 25% of misses

POTENTIAL OF ASSOCIATIVITY

7

How can we make DRAM caches associative?

Assumes 16-core system, with 4GB DRAM-Cache, in front of PCM memory.

Page 8: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ASSOCIATIVITY OPTION 1: SERIAL TAG LOOKUP

Serial Tag Lookup enables associativity, but, it has serialization delay.

8

A B

Way0 Way1Address

A BIfmiss

Page 9: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ASSOCIATIVITY OPTION 2: PARALLEL TAG LOOKUP

Parallel Lookup avoids serialization latency, but, it introduces 2x bandwidth cost.

9

A B

Way0 Way1Address

A B

Page 10: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

60

70

80

90

1-way

2-way

4-way

8-way

(a) Hit Rate

Hit

Ra

te (

%)

0

0.5

1

1.5

2-way

4-way

8-way

(b) Speedup (Parallel)

Sp

ee

du

p (

Pa

ralle

l) 0

0.5

1

1.5

2-way

4-way

8-way

(c) Speedup (Idealized)

Sp

ee

du

p (

Ide

aliz

ed

)Reduce 25% of misses -46%

ASSOCIATIVITY FOR DRAM CACHE (PARALLEL)

10

Increasing associativity naively actually degrades performance due to increased BW cost

Page 11: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

60

70

80

90

1-way

2-way

4-way

8-way

(a) Hit Rate

Hit

Ra

te (

%)

0

0.5

1

1.5

2-way

4-way

8-way

(b) Speedup (Parallel)

Sp

ee

du

p (

Pa

ralle

l) 0

0.5

1

1.5

2-way

4-way

8-way

(c) Speedup (Idealized)

Sp

ee

du

p (

Ide

aliz

ed

)Reduce 25% of misses -46%

21%

ASSOCIATIVITY FOR DRAM CACHE (IDEAL)

11

With latency / BWof direct-mapped

Associativity must still maintain the latency/BWof direct-mapped caches. How?

Page 12: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

OPTION 3: WAY-PREDICTED TAG LOOKUP

Way-Predicted Tag Lookup can obtain improved hit-rate, with BW / latency of direct-mapped cache.

12

Way-Predicted Tag Lookup

A B

Way0 Way1Address

BIfmiss

WayPrediction

Page 13: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

Accuracy(4-way) 74.3% 91.6%

Accuracy(8-way) 63.2% 81.2%

MRUPred(1bit/set)

Partial-Tag(4bit/line)

SRAMStorage 4MB 32MB

Way-Pred Accuracy(2-way)

85.7% 97.3%

WAY-PREDICTION ACCURACY & COST

Prior methods for way-prediction have low accuracy and/or have high storage overhead.

13

Page 14: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

TOWARDS ASSOCIATIVITY W/ WAY-PREDICTION

14

Way-Predicted Tag Lookup

A B

Way0 Way1Address

BIfmiss

WayPrediction

Goal: Low storage-overhead and high accuracy way-prediction, to enable associative DRAM cache

Page 15: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD OVERVIEW

• Background

• ACCORD– Probabilistic Way-Steering (PWS)– Ganged Way-Steering (GWS)– Skewed Way-Steering (SWS)

• Summary

15

Page 16: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

INSIGHT: WAY-PREDICTABILITY AT LOW STORAGE?

Insight: Modifying install policy can make way-prediction much simpler!

16

EVENODD

Way0 Way1

EVEN

ODDODDEVEN

Base Install Policy (Rand)

EVENODDEVEN

ODDODDEVEN

Tag-based Install Policy

Way0 Way1

Hard-to-predict (~50%) Predict 100%! But, direct-mapped

Page 17: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

PROPOSAL: ACCORD

AssoCiativity by CoORDinating way-install and prediction.ACCORD achieves a way-predictable cache at low cost.

17

Way0 Way1

A2B3A3

B5B7

WayInstallPolicy

WayPredictor

Coordinate

Page 18: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD OVERVIEW

• Background

• ACCORD– Probabilistic Way-Steering (PWS)– Ganged Way-Steering (GWS)– Skewed Way-Steering (SWS)

• Summary

18

Page 19: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

PROBABILISTIC WAY-STEERING

PWS enables way-predictability, by trading speed of learning to use both ways (hit-rate)

19

Install using PWS

PageA,BBias=90% 10%

Static prediction: ~90%

B1B2B3B4

B6B7

B0

B5

Way0 Way1Address

A1A2

A3A4

A6A7

A0

A5

B1B2B3B4

B6B7

B0

B5

A1A2

A3A4

A6A7

A0

A5

Preferred

Will use both ways, improve hit-rate

Page 20: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SENSITIVITY TO PWS PROBABILITY

20

0% 20% 40% 60% 80% 100%

0% 2% 4% 6% 8%

10% 12% 14%

50% 60% 70% 80% 85% 90% 100% Way-PredAccuracy(%

)

MissRed

uctio

n(%

)

Biasforselecting“preferredway”

Way-PredAccuracy

2-way design Direct-mapped

Preferred-wayInstallProbability=x%biastoinstallinpreferredway

Page 21: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SENSITIVITY TO PWS PROBABILITY

21

0% 20% 40% 60% 80% 100%

0% 2% 4% 6% 8%

10% 12% 14%

50% 60% 70% 80% 85% 90% 100% Way-PredAccuracy(%

)

MissRed

uctio

n(%

)

Preferred-wayInstallProbability

MissReduction(%) Way-PredAccuracy

2-way design Direct-mapped

Page 22: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SENSITIVITY TO PWS PROBABILITY

22

2.6% 3.7% 4.7% 5.5% 5.6% 5.3%

0.0% 0% 20% 40% 60% 80% 100%

0% 2% 4% 6% 8%

10% 12% 14%

50% 60% 70% 80% 85% 90% 100% Way-PredAccuracy(%

)

MissRed

uctio

n(%

)Speedu

p(%

)

Preferred-wayInstallProbability

Speedup MissReduction(%) Way-PredAccuracy

Preferred-way Install Probability (85%) provides best trade-off of hit-rate for WP accuracy, for 5.6% speedup.

5.6% speedup

Page 23: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD OVERVIEW

• Background

• ACCORD– Probabilistic Way-Steering (PWS)– Ganged Way-Steering (GWS)– Skewed Way-Steering (SWS)

• Summary

23

Page 24: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

GANGED WAY-STEERING

24

B1B2B3B4

B6B7

B0

B5

Way0 Way1Address

B0B1B2B3B4

B6B7

B5

Way0 Way1Address

B0B1B2B3B4

B6B7

B5

A0A1A2A3A4

A6A7

A5

A1A2

A3A4

A6A7

A0

A5

B1B2B3B4

B6B7

B0

B5

A1A2

A3A4

A6A7

A0

A5

Probabilistic Way-SteeringPer-line randomized decision

Ganged Way-SteeringPer-page rand decision

Preferred Preferred

Pred ~50% Pred >90%

Ganged Way-Steering makes install decision at large granularity, to improve predictability for workloads with high spatial locality.

Page 25: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

GANGED WAY-STEERING IMPLEMENTATION

25

Way0 Way1

A2B3A3

B5B7

0x001

RegionID WayGuideInstall

RecentInstallTable(RIT)

Install RegionID

0x101

WayPredictWay

RecentLookupTable(RLT)

Access

GWS Per-Region Last-Way install + Last-Way prediction. 64-entry RIT and 64-entry RLT needs only 320 Bytes.

0 1

Page 26: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

PWS+GWS WAY-PREDICTION ACCURACY

26

70%

75%

80%

85%

90%

95%

100%

PWS+GWSPWSLibquantum

GWS enables spatial workloads to have near-100% accuracyPWS has ~85% base accuracy

Combination of PWS+GWS achieves 90% accuracy,at the cost of 320B storage.

Way-PredAcc(%

)

70%

75%

80%

85%

90%

95%

100%

Average(21workloads)PWS+GWSPWS

Page 27: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

PWS+GWS (ACCORD 2-WAY) RESULTS

PWS + GWS gets 7.3% of 10% speedup of perfectly-predicted 2-way cache.

7.3% speedup

27System assumes 4GB DRAM Cache, and PCM-based main memory.

0%

2%

4%

6%

8%

10%

12% Speedu

p

PWSPWS+GWS

Perfect

Page 28: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD OVERVIEW

• Background

• ACCORD– Probabilistic Way-Steering (PWS)– Ganged Way-Steering (GWS)– Skewed Way-Steering (SWS)

• Summary

28

Page 29: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

DIFFICULTY IN SCALING TO N-WAYS

• Scaling ACCORD to N-ways– ACCORD 4-way has 3% speedup– ACCORD 8-way has 6% slowdown…

We need solutions to reduce miss-confirmation

• Miss confirmation: N-way cache needs N accesses to confirm line is not resident

29

EA CB D

Way0 Way1 Way2 Way3AddressEMiss!

Page 30: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SOLUTION: SKEWED WAY-STEERING

Restricting placement, reduces miss-confirmation èhit-rate benefits without any storage overhead

30

Only2lookupstodeterminemiss

Way0 Way2Way3

EAccess:

ABCA B

4-waywith2-skew:Access:ABC

OnePreferred+OneAlternateway

Way1

Page 31: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SPEEDUP FROM ACCORD (WITH SWS)

31

0%

2%

4%

6%

8%

10%

12% Speedu

p

2-Way

SWS 8-way achieves 11% speedup

4-Way 8-Way

Page 32: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD OVERVIEW

• Background

• ACCORD– Probabilistic Way-Steering (PWS)– Ganged Way-Steering (GWS)– Skewed Way-Steering (SWS)

• Summary

32

Page 33: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

SUMMARY OF ACCORD

§ ACCORD: associative DRAM caches by coordinating way-install and way-prediction.

§ Probabilistic Way-Steering§ Biased-install enables accurate static way-prediction

§ Ganged Way-Steering§ Region-based install enables accurate region-based way-prediction

§ Skewed Way-Steering§ Skew enables flexibility in line placement, while maintaining miss cost

§ ACCORD enables associativity at negligible storage cost (320B), to achieve 11% speedup.

33

Page 34: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

ACCORD BACKUP SLIDES

ACCORD backup slides

34

Page 35: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

REPLACEMENT POLICY?

• LRU– State in SRAM

• 1-bit per line needs 8MB. Size of Last-level cache– State in DRAM

• 9% slowdown due to state-update cost (Hit to alternate way)

35

Page 36: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

COMPARISON TO OTHER WAY PREDICTORS

36

-6% -4% -2% 0% 2% 4% 6% 8%

10% 12%

Speedu

p

ACCORD outperforms other predictors while needing negligible storage overhead (320 B)

Page 37: ACCORD: Associativity for DRAM Caches by Coordinating Way … · 2018-06-20 · ACCORD: Associativity for DRAM Caches by Coordinating Way-Install and Way-Prediction ISCA 2018 1 Vinson

COLUMN-ASSOCIATIVE CACHE

• Column-associative / Hash-Rehash cache – Install lines in preferred way (way-0)– On eviction, move line to alternate way (way-1)– On hit to alternate way, move to preferred way

• Effectiveness– In general, way-prediction accuracy similar to MRU– But, requires significant bandwidth to swap lines on hit

to alternate way. CA-cache thus causes 4% slowdown.

37