block disabling characterization and improvements in cmps...
TRANSCRIPT
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling Characterization andImprovements in CMPs Operating
at Ultra-low Voltages
A. Ferreron1, D. Suarez-Gracia2, J. Alastruey-Benede1,T. Monreal3, V. Vinals1
1Universidad de Zaragoza, Spain
2Qualcomm Research Silicon Valley, USA
3Universidad Politecnica de Cataluna, Spain
SBAC-PAD Oct-2014
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 1 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Operation near the threshold voltage (Vth)
Vdd and Vth scaling has stopped
Power density no longer stays constant among technologygenerations and dark silicon appears
Operation at ultra-low Vdd
I Reduce the power and energy consumption
I Switch on more cores to exploit parallelism
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 2 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Operation near the threshold voltage (Vth): Challenges
Delay increases: lower voltage → lower frequency
I Compensate with parallelism: more active cores with the samepower budget
Increasing sensitivity to process variation (deviation of deviceparameters from their nominal values)
I Memory structures especially sensitive to variationI Conventional 6T cells: read, write, access, and hold failuresI Lower voltages → stability margins decrease → increasing cell
failure rateI Vddmin of memory blocks to guarantee reliable operation
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 3 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Objective
Lower Vdd to near-threshold voltages → energy efficient operation
Problem
High sensitivity of SRAM structures to variation at ultra-low Vdd
Our proposal
Mitigate the impact of SRAM cell failures at ultra-low Vdd usinglow complexity techniques:Block Disabling with Operational Tags and Block Disabling withOperational Tags and Cache-to-cache Transfers
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 4 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 5 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 6 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Example of Probability of Failure of SRAM Cells at 22nm
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 7 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Example of Probability of Failure of SRAM Cells at 22nm
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
1.00e-08
1.00e-07
1.00e-06
1.00e-05
1.00e-04
1.00e-03
1.00e-02
1.00e-01
1.00e+00
0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Pfai
l (22
nm)
Vdd (V), Vddmin: 0.8V
{objective
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 7 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Bit Probability of Failure Affects Yield
1.00E-12
1.00E-10
1.00E-08
1.00E-06
1.00E-04
1.00E-02
1.00E+00
1.00E-12 1.00E-10 1.00E-08 1.00E-06 1.00E-04 1.00E-02
Str
uctu
re P
robabili
ty o
f F
ailu
re
Bit Probability of Failure
1Bit1Byte8Byte
64Byte64KByte
512KByte8MByte
Yield 1/1000
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 8 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Bit Probability of Failure Affects Yield
1.00E-12
1.00E-10
1.00E-08
1.00E-06
1.00E-04
1.00E-02
1.00E+00
1.00E-12 1.00E-10 1.00E-08 1.00E-06 1.00E-04 1.00E-02
Stru
ctur
e Pr
obab
ility
of F
ailu
re
Bit Probability of Failure
1Bit1Byte8Byte
64Byte64KByte
512KByte8MByte
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 8 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 9 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Traditional Cache Hierarchy
L2 bank
L1i L1d
P
R
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 10 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Traditional Cache Hierarchy
L2 bank
L1i L1d
P
R
L2 bank
L1i L1d
P
R
L2 bank
L1i L1d
P
R
L2 bank
L1i L1d
P
R
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 10 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling Fundamentals
SRAM cell failure detected:Block Disabling (BD) deactivates entry (tag and data)Simple implementation and low overhead: 1 bit per cache entry
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
set i
set i+1
set i+2
set i+3
LLC Bank
Tag Array Data Array
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 11 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling Fundamentals
SRAM cell failure detected:Block Disabling (BD) deactivates entry (tag and data)Simple implementation and low overhead: 1 bit per cache entry
@A
@C
@B
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@I]
[@H]
[@F]
[@G]
set i
set i+1
set i+2
set i+3
LLC Bank
Tag Array Data Array
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 11 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling at Ultra-low Voltages
At lower voltages capacity and associativity degrade very fast
I Available capacity for 16-way, 1MB cache bank with blockdisabling (block size is 64 bytes):
Vdd Available capacity (KB)0.55V 887 KB (86%)
0.50V 408 KB (40%)
0.45V 138 KB (13%)
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 12 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling at Ultra-low Voltages
At lower voltages capacity and associativity degrade very fast
I Associativity degradation for 16-way, 1MB cache bank withblock disabling (block size is 64 bytes):
0
200
400
600
800
1000
0 1 2 3 4 5 6 7 8 9 10111213141516
Num
ber o
f set
s
Number of faulty ways
0.550 V
0
200
400
600
800
1000
0 1 2 3 4 5 6 7 8 9 10111213141516
Num
ber o
f set
s
Number of faulty ways
0.500 V
0
200
400
600
800
1000
0 1 2 3 4 5 6 7 8 9 10111213141516
Num
ber o
f set
s
Number of faulty ways
0.450 V
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 13 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Inclusive Hierarchies
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
LLC Bank (Shared)
Tag Array Data Array
...[@C]@C
[@F]@F
Private Level
[@A]@A
[@I]@I
Private Level
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 14 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Inclusive Hierarchies and Block Disabling Interaction
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
LLC Bank (Shared)
Tag Array Data Array
...[@C]@C
[@F]@F
Private Level
[@A]@A
[@I]@I
Private Level
x
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 15 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 16 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
BD with operational tags (BDOT)
BD with operational tags: BDOT
Allow blocks to be allocated as just tags: entries with faulty bitscan still be used to allocate tag-only blocks in LLC
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
LLC Bank (Shared)
Tag Array Data Array
...[@C]@C
[@F]@F
Private Level
[@A]@A
[@I]@I
Private Level
x
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 17 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
BD with operational tags (BDOT)
BD with operational tags: BDOT
I Protect the tag arrayI Bigger/robust cells: bigger transistors/more transistors per cell
(assist circuitry)I More complex error correction codes (ECC)
I Why not protect the whole cache structure?I Area and power increase when using bigger/robust cellsI Complex ECC require extra storage and checking hardware:
might increase access latencyI Tag array roughly 10% of the cache area (LLC)
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 18 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
BD with operational tags and C2C (BDOT-C2C)
BDOT with cache-to-cache trasnfers: BDOT-C2C
I Problem: requests to tag-only blocks → off-chip transactions
I Observation: shared blocks already on-chip (private levels)
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
LLC Bank (Shared)
Tag Array Data Array
...[@C]@C
[@F]@F
Private Level
[@A]@A
[@I]@I
Private Level
x
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 19 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
BD with operational tags and C2C (BDOT-C2C)
BDOT with cache-to-cache trasnfers: BDOT-C2C
I Problem: requests to tag-only blocks → off-chip transactions
I Observation: shared blocks already on-chip (private levels)
@A
@C
@B
@D
@I
@H
@F
@G
[@A]
[@C]
[@B]
[@D]
[@I]
[@H]
[@F]
[@G]
LLC Bank (Shared)
Tag Array Data Array
...[@C]@C
[@F]@F
Private Level
[@A]@A
[@I]@I
Private Level
x
x
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 19 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
BD with operational tags and C2C (BDOT-C2C)
BDOT with cache-to-cache trasnfers: BDOT-C2C
Provide cache-to-cache transfers of clean blocks: leveragecoherence protocol
I The protocol already does cache-to-cache transfers ofexclusively owned blocks
I Slight change in the coherence protocol behavior, but nohardware overhead
I Potential gain depends on the applications sharing degree
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 20 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 21 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Methodology
Memory
channels
DRAM
Main memory
CORE
L1I L1D
L2tag&data
Dir
R
MC
CMP Tile
Register !les, branch
predictor, ALUs, control, ...
I Experimental set-up:Simics + GEMS + GARNET + DRAMSim2 + McPAT
I PARSEC benchmark suite
I Random faults + Monte Carlo simulations
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 22 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
On-chip Energy Consumption
0
0.5
1
1.5
2
0.450V0.500V
0.550V0.600V
0.700V
Ener
gy c
onsu
mpt
ion
norm
aliz
ed to
0.8
V
vipsswaptionsfluidanimatecannealbodytrackblackscholes
Block Disabling (On-Chip)BDOT (On-Chip)
BDOT-C2C (On-Chip)
Minimum energy on-chip:voltages values between 0.45-0.5V more active cores → potentialhigher performance
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 23 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
On-chip Energy Consumption
0
0.5
1
1.5
2
0.450V0.500V
0.550V0.600V
0.700V
Ener
gy c
onsu
mpt
ion
norm
aliz
ed to
0.8
V
blackscholes
Block Disabling (On-Chip)BDOT (On-Chip)
BDOT-C2C (On-Chip)
lower Vdd
the lowerthe better
Minimum energy on-chip:voltages values between 0.45-0.5V more active cores → potentialhigher performance
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 23 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
On-chip Energy Consumption
0
0.5
1
1.5
2
0.450V0.500V
0.550V0.600V
0.700V
Ener
gy c
onsu
mpt
ion
norm
aliz
ed to
0.8
V
vipsswaptionsfluidanimatecannealbodytrackblackscholes
Block Disabling (On-Chip)BDOT (On-Chip)
BDOT-C2C (On-Chip)
Minimum energy on-chip:voltages values between 0.45-0.5V more active cores → potentialhigher performance
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 23 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Total Energy Consumption
0
0.5
1
1.5
2
0.450V0.500V
0.550V0.600V
0.700V
Ener
gy c
onsu
mpt
ion
norm
aliz
ed to
0.8
V
10.47 3.55 18.421.291.652.223.8 Block Disabling (On-Chip)
DRAM
BDOT (On-Chip)BDOT-C2C (On-Chip)
vipsswaptionsfluidanimatecannealbodytrackblackscholes
Minimum system energy:off-chip memory energy consumption main sourcehigher voltage values (0.55-0.6V)
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 24 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Outline
Impact of SRAM Failures at Ultra-low Voltages
Block Disabling (BD) Fundamentals and Trade-offs
Improve BD by protecting the cache tagsBD with operational tags (BDOT)BD with operational tags and C2C (BDOT-C2C)
Results
Conclusions
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 25 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Conclusions
I Operation near Vth for energy efficient operationI Switch on inactive coresI Reduce the overall energy consumption
I SRAM structures fail when lowering Vdd
BD: simple, low overhead, but not effective at ultra-low Vdd
Inclusive hierarchies: BD increases inclusion victimsI BDOT: allow blocks allocated as tag-only → protect inclusionI BDOT-C2C: provide cache-to-cache transfers of shared blocks
→ reduce off-chip transactionsI BDOT & BDOT-C2C: substantial reduction on-chip power
and energy consumption
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 26 / 27
SRAM Failures BD Fund. & Trade-offs BDOT & BDOT-C2C Results Conclusions
Block Disabling Characterization andImprovements in CMPs Operating
at Ultra-low Voltages
A. Ferreron1, D. Suarez-Gracia2, J. Alastruey-Benede1,T. Monreal3, V. Vinals1
1Universidad de Zaragoza, Spain
2Qualcomm Research Silicon Valley, USA
3Universidad Politecnica de Cataluna, Spain
SBAC-PAD Oct-2014
A. Ferreron {[email protected]} - SBAC-PAD’14
Block Disabling Characterization and Improvements in CMPs Operating at Ultra-low Voltages 27 / 27