lucía g. menezo valentín puente josé Ángel gregorio university of cantabria (spain)
DESCRIPTION
MOSAIC : . The Case for a Scalable Coherence Protocol for Complex On-Chip Cache Hierarchies in Many-Core Systems. Lucía G. Menezo Valentín Puente José Ángel Gregorio University of Cantabria (Spain). Outline. Motivation Directory Schemas In-cache Sparse MOSAIC Coherence Protocol - PowerPoint PPT PresentationTRANSCRIPT
The Case for a Scalable Coherence Protocol for
Complex On-Chip Cache Hierarchies in Many-Core
SystemsLucía G. Menezo
Valentín PuenteJosé Ángel Gregorio
University of Cantabria (Spain)
MOSAIC :
University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
3University of CantabriaEdinburgh - PACT 2013
Performance improvement: more processors per chip
Major challenges: off-chip bandwidth wall Introduce cache into the chip Complex on-chip cache hierarchies
Coherence protocol: fundamental role to play
Motivation
4University of CantabriaEdinburgh - PACT 2013
What coherence protocol to use with large number of cores: ◦ Broadcast-based protocols high energy
requirements◦ Directory-based protocols more storage
necessities for sharing information
MOSAIC: new coherence protocol◦ Directory without inclusiveness◦ Token Coherence to guarantee correctness
Motivation
University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
6University of CantabriaEdinburgh - PACT 2013
Each block in LLC includes tag, data and the sharers information
LLC receives requests needs precise knowledge
Inclusiveness is necessary: any block in the private levels needs to be allocated in LLC
Advantage: coherence protocol less complex Disadvantage: all LLC blocks has storage
overhead
Directory schemas: In-cache
7University of CantabriaEdinburgh - PACT 2013
@ data
sharers
@ data
@ data
@ data
@ data
P
Proc
esso
rs a
nd p
rivat
e ca
ches
LLC + in-cache directory
P
P
P Inte
rcon
nect
ion
netw
ork
Overhead!!!
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
Directory schemas: In-cache
8University of CantabriaEdinburgh - PACT 2013
Directory schemas: In-cache@ dat
asharers @ dat
asharers
LLC + in-cache directory
Inte
rcon
nect
ion
netw
ork
Overhead!!!
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
Overhead!!!
Proc
esso
rs a
nd p
rivat
e ca
ches
9University of CantabriaEdinburgh - PACT 2013
Directory entries separated from data Allocated under demand Overhead proportional to the aggregate
private levels size (not LLC) Capacity and associativity has to be
sufficient to keep private-level cache tags
Directory schemas: Sparse
10University of CantabriaEdinburgh - PACT 2013
@ data
sharers @ data
Directory schemas: Sparse
Inte
rcon
nect
ion
netw
ork
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@ dataP
@@ sharers
LLCSparse dir
Proc
esso
rs a
nd p
rivat
e ca
ches
11University of CantabriaEdinburgh - PACT 2013
Duplicate-tag directory: holding all the tags of private levels
Example: 16 cores with 4-way 32KB L1 64-way
Directory schemas: SparseAssociativity = # cores * private caches associativity
# sets = # private
caches sets
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
12University of CantabriaEdinburgh - PACT 2013
Directory schemas: Sparse
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
Decrease Associativity: now << # cores * private caches associativity
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
sharers sharerssharerssharerssharerssharerssharerssharers
sharerssharerssharerssharers
sharers sharerssharerssharerssharerssharerssharerssharers
sharerssharerssharerssharers
tagtagtagtagtagtag
tagtagtagtagtagtag
One tag may be in various private caches
More than 1 tag per entry conflicts
Inclusiveness needed invalidate private data (recalls messages)
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
tagtagtagtagtagtag
Increasenumber of sets
13University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
14University of CantabriaEdinburgh - PACT 2013
In-cache or sparse it doesn’t matter No inclusiveness No invalidations of data in private caches Reconstruction of sharing information under
demand Uses token counting to avoid extra traffic and
guarantee correctness
Token Coherence protocol:◦ Initially each block := # tokens (==#procs) ◦ Read request: data and 1 token◦ Write request: data and all tokens
MOSAIC Protocol
15University of CantabriaEdinburgh - PACT 2013
MOSAIC Conceptual Approach
I 0 N/A
P0
O 2 DATA
P1
S 1 DATA
P2
SharersI
Last Level Cache
I 0 N/A
Data_sliceDir_slice Memory
Controller
On-chip network
Priv
ate
Cach
es
1
2
3
4
5
State Num. Tokens
Data
V
2
3
1
16University of CantabriaEdinburgh - PACT 2013
When data not present in LLC broadcast for reconstruction
Private caches inform of num. of held tokens
Token counting avoids negative acknowledgements or timeouts
Reconstruction message piggybacks type of request and requestor
Key: directory may replace silently no invalidations
MOSAIC Key Facts
17University of CantabriaEdinburgh - PACT 2013
MOSAIC Read RequestP0 P1 P2
InvalidState IS
Read
P3 Dir LLC
State SState OState C
Data + token
State A
ReconstructionInfo 1 tokenInfo 2 tokensOwnerUnblock (info 1 token)
Read
Forward GETS to Owner
Sharers [P2]Owner: ¿?Sharers [P2, P1]Owner: P1Sharers [P2, P1, P0]Owner: P1
Data + token
3 tokens 1 token
Unblock Sharers [P2, P1, P0, P3]Owner: P1
18University of CantabriaEdinburgh - PACT 2013
MOSAIC Write RequestP0 P1 P2
InvalidState IS
WriteP3 Dir LLC
State SState OState C
Data + 3 tokens
State A
Reconstruction
Sharers [P0]Owner: P0
3 tokens 1 token
State IMState M
1 token
Unblock (info all tokens)
Directory Eviction
19University of CantabriaEdinburgh - PACT 2013
Motivation Directory Schemas
◦ In-cache ◦ Sparse
MOSAIC Coherence Protocol◦ Examples
Evaluation Results Conclusions
Outline
20University of CantabriaEdinburgh - PACT 2013
Evaluation methodologyConfig 1 Config 2
Number of cores 8 @3GHz 16 @3GHzIWin size/Issue
Width 128, 4-wayBlock size 64B
Private
L1 Size /
Associativity32KB I/D, 2-way
L2 Size /
Associativity64KB, 4-way
(exclusive with L1)
L3 Shared
Size / Associativity
16MB 16-way
32MB16-way
NUCA Mapping Static, interleaved across slices
Memory Capacity 4GBMax. Outstanding Mem. Operations 16
Topology 4×4 Mesh 6×6 Mesh
Core 0 Core 1 Core 2 Core 3
Core 4 Core 5 Core 6 Core 7
R R R R
R R R R
R R R R
R R R R
Slice 0 Slice 2Slice 1 Slice 3
Slice 4 Slice 6Slice 5 Slice 7
Slice 8 Slice 10Slice 9 Slice 11
Slice 12 Slice 14Slice 13 Slice 15
Core 0 Core 1 Core 2 Core 3
R R R R
R R R R
R R R R
R R R R
Slice 0 Slice 2Slice 1 Slice 3
Slice 5 Slice 7Slice 6 Slice 8
Slice 11 Slice 13Slice 12 Slice 14
Slice 17 Slice 19Slice 18 Slice 20
R
R
R
R
Slice 9
Slice 15
Slice 21
R
R
R
R
Slice 4
Slice 10
Slice 16
R R R RSlice 23 Slice 25Slice 24 Slice 26
RSlice 27
RSlice 22
R R R RSlice 28 Slice 30Slice 29 Slice 31
RR
Core 7Core 5
Core 6Core 4
Core 11 Core 10 Core 9 Core 8Co
re
12Co
re 1
4Co
re 1
3Co
re 1
5
21University of CantabriaEdinburgh - PACT 2013
GEMS: full-system evaluation
◦SLICC: Specification Language for Implementing Cache Coherence
Simulation stack and Workloads
Multithreaded Workloads
4 Wisconsin Commercial Workload
3 NAS Parallel Bench.
Multiprogrammed Workloads
3 Spec 2006 (Rate Mode)
22University of CantabriaEdinburgh - PACT 2013
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0.50.60.70.80.9
11.1
64w128KB 32w128KB 2w128KB 1w128KB
MOSAIC PerformanceReducing associativity
Norm
alize
d ex
ecut
ion
time
128KB 16K entries (8 bytes per entry)
23University of CantabriaEdinburgh - PACT 2013
Number of misses64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
BASE MO-SAIC
Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus
00.20.40.60.8
11.21.41.61.8
2Misses L2 Misses L1I Misses L1D
Norm
alize
d nu
m. m
isses x2
24University of CantabriaEdinburgh - PACT 2013
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0.40.50.60.70.80.9
11.1
64w16KB 32w16KB 2w16KB 1w16KB
MOSAIC Performance Reducing associativity and capacity
Norm
alize
d ex
ecut
ion
time
128KB 16K entries (8 bytes per entry) 16KB 2K entries
25University of CantabriaEdinburgh - PACT 2013
MOSAIC Latency64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1 64 32 2 1
BASE
MOSAIC
Astar Hmmer Omnetpp FT IS LU Apache Jbb OLTP Zeus
0
2
4
6
8
10
12
L3 Other L2 Other L1 Private L2 Local L1
Late
ncy
(Pro
cess
or C
ycle
s)
16KB 2K entries
26University of CantabriaEdinburgh - PACT 2013
Aver
age
netw
ork
link
utiliz
atio
n
MOSAIC Link Utilization
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0
0.2
0.4
0.6
0.8
1
1.2
1.4 64w128KB 64w64KB 64w32KB 64w8KB 2w128KB 2w64KB2w16KB
27University of CantabriaEdinburgh - PACT 2013
MOSAIC Link Utilization vs. Dir
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
0
0.2
0.4
0.6
0.8
1
1.2
1.42w128KB 2w64KB 2w16KB
Nor
mal
ized
net
wor
k lin
k ut
iliza
tion
40%!!
28University of CantabriaEdinburgh - PACT 2013
MOSAIC Scalability
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
00.20.40.60.8
11.21.41.61.8
2 128w256KB 128w128KB 128w64KB 128w32KB 2w256KB 2w128KB2w64KB 2w32KB
Norm
alize
d lin
k ut
ilizat
ion
16 cores configuration
29University of CantabriaEdinburgh - PACT 2013
Low complexity and great scalability Very low storage overhead No noticeable energy cost Alternative for future many-core cache
coherent CMPs
ConclusionsBandwidth scalability of a directory
Elegancy of Token Coherence
MOSAIC Coherence Protocol
30University of CantabriaEdinburgh - PACT 2013
Thank you for your attention
31University of CantabriaEdinburgh - PACT 2013
32University of CantabriaEdinburgh - PACT 2013
Realistic Cache Configuration
Astar
Hmmer
Omnetpp FT IS LU
Apach
e Jbb OLTP Zeus
Gmean
00.20.40.60.8
11.2
16w512KB 16w256KB 16w128KB 16w64KB 16w32KB
Norm
alize
d ex
ecut
ion
time
- Same experiment with BASE: 20% impact in some cases
L1: 4-way 32KB / L2: 8-way 256KBx2 full dir 1/10 full dir
33University of CantabriaEdinburgh - PACT 2013
MOSAIC Energy12
8 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16 128 64 16
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MO-SAIC
BASE
MOSAIC
Astar Hmmer Om-netpp
FT IS LU Apache Jbb OLTP Zeus
00.20.40.60.8
11.21.41.61.8
Network Sparse directory L3 L2 L1
Norm
alize
d Dy
nam
ic En
ergy