non-uniform cache architectures for wire delay dominated caches

22
Non-Uniform Cache Non-Uniform Cache Architectures Architectures for Wire Delay Dominated for Wire Delay Dominated Caches Caches Abhishek Desai Bhavesh Mehta Devang Sachdev Gilles Muller

Upload: adler

Post on 14-Jan-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Non-Uniform Cache Architectures for Wire Delay Dominated Caches. Abhishek Desai Bhavesh Mehta Devang Sachdev Gilles Muller. Plan. Motivation What is NUCA UCA and ML-UCA Static NUCA Dynamic NUCA Simulation Results. Motivation. Bigger L2 and L3 Caches are needed Programs are larger - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Non-Uniform Cache ArchitecturesNon-Uniform Cache Architecturesfor Wire Delay Dominated Cachesfor Wire Delay Dominated Caches

Abhishek Desai

Bhavesh Mehta

Devang Sachdev

Gilles Muller

Page 2: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

PlanPlan

MotivationWhat is NUCAUCA and ML-UCAStatic NUCADynamic NUCASimulation Results

Page 3: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

MotivationMotivation

Bigger L2 and L3 Caches are needed– Programs are larger– SMT requires large cache for spatial locality– BW demands have increased on the package– Smaller technologies permit more bits per mm2

Wire delays dominate in large caches– Bulk of the access time will involve routing to

and from the banks, not the bank accesses themselves

Page 4: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

What is NUCA?What is NUCA?

Data residing closer to the processor is accessed much faster than data that reside physically farther from the processor

Example:

The closest bank in a 16MB on-chip L2 cache built in 50nm process technology could be accessed in 4 cycles, while an access to the farthest bank might take 47 cycles.

Page 5: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

UCA and ML-UCA UCA and ML-UCA

UCA

Avg. access time: 255 cyclesBanks: 1Size: 16MBTechnology: 50nm

L2

41

L3

41

L2

10

ML-UCA

Avg. access time: 11/41 cyclesBanks: 8/32Size: 16MBTechnology: 50nm

Page 6: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Static-NUCA-1Static-NUCA-1

17 41

S-NUCA-1

Avg. access time: 34 cyclesBanks: 32Size: 16MBTechnology: 50nmArea: Wire overhead 20.9%

Page 7: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

S-NUCA-1 cache designS-NUCA-1 cache design

Tag Array

Data Bus

Address Bus

Bank

Sub-bank

Predecoder

Senseamplifier

Wordline driverand decoder

Page 8: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Static-NUCA-2Static-NUCA-2

9 32……

……

S-NUCA-2

Avg. access time: 24 cyclesBanks: 32Size: 16MBTechnology: 50nmArea: Channel overhead 5.9%

Page 9: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

S-NUCA-2 cache designS-NUCA-2 cache design

Addressbus

Senseamplifier

Bank

Data bus

Switch

Tag Array

Wordline driverand decoder

Predecoder

Page 10: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Dynamic-NUCADynamic-NUCA

D-NUCA

Avg. access time: 18 cyclesBanks: 256Size: 16MBTechnology: 50nm

4 47……

……

Data migration

Page 11: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Management of Data in DNUCAManagement of Data in DNUCA

Mapping:– How the data are mapped to the banks and in

which banks a datum can reside?

Search:– How the set of possible locations are searched

to find a line?

Movement:– Under what conditions the data should be

migrated from one bank to another?

Page 12: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Simple Mapping (implemented)Simple Mapping (implemented)

8 bank sets

way 1

way 2

way 3

way 4

memory controller

one set

bank

Page 13: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Fair and Shared MappingFair and Shared Mapping

Fair Mapping Shared Mapping

memory controller memory controller

Page 14: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Searching Cached LinesSearching Cached Lines

Incremental search Multicast search (Implemented) Limited multicast Partitioned multicast

Smart Search: ss-performance ss-energy

Page 15: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Dynamic Movement of LinesDynamic Movement of Lines

LRU line furthest and MRU line closest One-bank promotion on a hit (implemented)

Policy on miss: Which line is evicted?

– Line in the furthest (slowest) bank -- (implemented) Where is the new line placed?

– Closest (fastest) bank– Furthest (slowest) bank -- (implemented)

What happens to the victim line? – Zero copy policy (implemented)– One copy policy

Page 16: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Advantages of DNUCA over ML-UCAAdvantages of DNUCA over ML-UCA

DNUCA does not enforce inclusion thus preventing redundant copies of the same line

In ML-UCA the faster level may not match the working set size of an application, either being too large and thus slow, or being too small and thus incurring misses

Page 17: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Configuration for simulationConfiguration for simulation

Used Sim-Alpha and CactiSimple mappingMulticast searchOne-bank promotion on each hitReplacement policy that chooses the

block in the slowest bank as the victim of a miss

Page 18: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Hit Rate Distribution for D-NUCAHit Rate Distribution for D-NUCA

Hit Rate Distribution

0

0.1

0.2

0.3

0.4

0.5

0.6

Cache row

Hit

ra

te

Page 19: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Simulation results – integer benchmarksSimulation results – integer benchmarks

UCA vs D-NUCA

00.20.40.60.81

1.21.4

SPEC INT 2000

IPC uca

d-nuca

Page 20: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

Simulation results – FP benchmarksSimulation results – FP benchmarks

UCA vs D-NUCA

00.20.40.60.81

1.21.41.61.8

SPEC FP 2000

IPC uca

d-nuca

Page 21: Non-Uniform Cache Architectures for Wire Delay Dominated Caches

SummarySummary

D-NUCA has the following plus points:Low Access LatencyTechnology scalabilityPerformance stabilityFlattens the memory hierarchy

Page 22: Non-Uniform Cache Architectures for Wire Delay Dominated Caches