stageweb : interweaving pipeline stages into a wearout and variation tolerant cmp fabric

36
University of Michigan Advanced Computer Architecture Laboratory StageWeb: Interweaving Pipeline Stages into a Wearout and Variation Tolerant CMP Fabric Shantanu Gupta Amin Ansari Shuguang Feng Scott Mahlke University of Michigan - Ann Arbor June 29, 2010 1

Upload: nitsa

Post on 22-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

StageWeb : Interweaving Pipeline Stages into a Wearout and Variation Tolerant CMP Fabric. Shantanu Gupta Amin Ansari Shuguang Feng Scott Mahlke University of Michigan - Ann Arbor June 29, 2010. Reliability Threats. Transient Faults due to Cosmic Rays & Alpha Particles - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

1

StageWeb: Interweaving Pipeline Stages into a Wearout and Variation Tolerant CMP Fabric

Shantanu Gupta Amin Ansari Shuguang Feng Scott Mahlke

University of Michigan - Ann Arbor

June 29, 2010

Page 2: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

2

Reliability ThreatsTransient Faults due to

Cosmic Rays & Alpha Particles(Increase exponentially withnumber of devices on chip)

N+ N+

Source DrainGate

P--+-+

-+-+

-+

Silicon Defects(Manufacturing defects and device wear-out)

Negative Bias Threshold Inversion

Oxide

Oxide Breakdown

Electromigration

C C C

C C C

C C C

Frequency

Process Variation(random and systematic variations

Intra-die ILD thicknessSpeed binning on a die

Page 3: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

3

Fault Tolerance Aspects

Detect and Diagnose Reconfigure Recover

Has anything gone wrong?

Figure out the cause

Isolate the broken

components

Resume execution

from a safe point

Page 4: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

4

Reconfiguring a Multi-core• At the coarsest level, cores can be disabled.

• Rumors that industry already uses this….► IBM Cell w/ 7 SPEs, AMD Tri-Core

• Can’t scale to higher failure rates!

C C C

C C C

C C C

C C C

C C C

C C C

C C C

C C C

C C C

C C C

C C C

C C C

Year 1 Year 3 Year 5 Year 7

Page 5: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

5

Reconfiguration Granularity

Lower complexity

FETCHDEC

EXEC

WB

MEM

CORE level STAGE level MODULE level

• ElastIC, DT’ 06• Reunion, MICRO’06• Configurable Isolation, ISCA’07

• Online Diagnosis of Hard Faults, MICRO’ 05• Ultra Low-Cost Defect Protection, ASPLOS’ 06

Better resource utilizationFor 100% area overhead (redundancy)

-- Poor MTTF gains+ Easy to implement

+ Good MTTF gains+ Circuit / Architectural boundary+ Full coverage

+ Best MTTF gains-- Complex implementation

100% MTTF ↑ 170% MTTF ↑ 200% MTTF ↑

Page 6: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

6

CMP Fabric

Core 2

Core 0 Core 1

Core 3

Stage1

StageN

Stage2

Stage3

Stage1

StageN

Stage2

Stage3

Stage1

StageN

Stage2

Stage3

Stage1

StageN

Stage2

Stage3

Stage1

Latch

Stage2

Latch

Stage3

StageN

Page 7: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

7

The StageNet (SN) Fabric

Stage1 StageNStage2 Stage3

Stage1 StageNStage2 Stage3

Stage1 StageNStage2 Stage3

Stage1 StageNStage2 Stage3

Configuration Manager

StageNet Slice (SNS)

Wearout Sensors• Delay• Temperature• Current

Crossbar Switch

Inpu

ts

Outputs

Page 8: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

8

A 4-Slice SN chipFetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Configuration Manager

Page 9: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

9

> 5X slowdown

Performance Comparison: Pipline vs. SN Slice

123

89

10

67

45

BR

register dependency

Commit Time1 2 3 6 7 8 9 10

5 stage pipeline

1 2 3 6 7 8 9 10

SN Slice

3. Transmission delays 2. Data forwarding1. Control stall

register wb

IssueFetch Decode Ex/Mem WB

LATC

H

LATC

H

LATC

H

LATC

H

GenPC

BranchPredictor

Register File

branch resolution bypass

Decode Ex/MemFetch

GenPC

BranchPredictor

Issue

Register File

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

Page 10: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

10

2. Data Forwarding

Bypass $

• Stores previous results

• Fully associative structure

• Emulates data forwarding

Stream ID• Control flow handling

• Eliminates flush signals

3. Transmission Delays1. Control Handling

>>

ST

LD

+

/

>>

&

<<

ST

+

LD

Macro-Ops

• Send instruction bundles

• Amortizes transfer delay

• Increases system utilization

01

Decode Ex/MemFetch

GenPC

BranchPredictor

Issue

Register File

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

doub

lebu

ffer

SID SID

Macro-op Generator

Bypass $

SN Slice Microarchitecture [MICRO’08]

Page 11: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

11

SN Slice Performance [MICRO’08]

0

1

2

3

4

5

6

3des

g721

deco

de

g721

enco

deidc

t

rawca

udio

rawda

udio

rijnda

elmcf eq

ngre

p wcMea

n

Nor

mal

ized

Run

time

SNS + StreamIDSNS + StreamID + Bypass$SNS + Stream ID + Bypass$ + MOPs

10% slowdown

Page 12: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

12

SN System - scaling to 100+ cores?F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

1. Crossbars don’t scale well due to wiring / layout complexity - Area - Delay - Power

2. Interconnection prone to failures - Single point of failure - Links have no redundancy

Page 13: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

13

D IF E/M

D IF E/M

D IF E/M

D IF E/M

D IF E/M

D IF E/M

D IF E/M

L2 $ L2 $ L2 $

L2 $ L2 $ L2 $

L2 $

L2 $

L2 $

L2 $L2 $

L 2 $

StageWeb: Scaling to 100+ cores• In a large many-core system, small groups of cores can form SN• What’s the right size for a SN island?

Traditional many-core

SN Island

SN SN SN SN

SN SN SN SN

SN SN SN SN

SN SN SN SN

StageWeb many-core

Page 14: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

14

StageWeb: Scaling to 100+ cores• In a large many-core system, small groups of cores can form SN• What’s the right size for a SN island?

• Unfortunately, a single crossbar can’t scale to 8-10 pipelines!

Good scaling Poor scaling

Page 15: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

15

Front-end

Back-end

Front-end

Back-end

Interconnection AlternativesFetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Isla

nd 1

Isla

nd 2

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode IssueIsla

nd 3

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode IssueIsla

nd 4

1. Connectivitya) Singleb) Single + Front-Backc) Overlapd) Overlap + Front-Back

Page 16: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

16

Interconnection AlternativesFetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Isla

nd 1

Isla

nd 2

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode IssueIsla

nd 3

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode IssueIsla

nd 4

1. Connectivitya) Singleb) Single + Front-Backc) Overlapd) Overlap + Front-Back

2. ReliabilityIn

puts

Outputs

a) crossbarIn

puts

Outputs

b) crossbar with spares

Inpu

tsO

utpu

ts

c) fault-tolerant crossbar

Page 17: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

17

Interconnection Configuration• Faults in stages, crossbar ports, links, force a

reconfiguration….

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Isla

nd 1

Isla

nd 2

Page 18: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

18

Interconnection Configuration• Single crossbar configuration

► Local to every island

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Issue

Ex/MemDecode

Isla

nd 1

Isla

nd 2 Ex/MemDecode

Fetch Issue

Page 19: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

19

Interconnection Configuration• Overlap crossbar configuration

► Sweep islands, forming pipelines opportunistically

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Isla

nd 1

Isla

nd 2

Isla

nd 3

Page 20: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

20

StageWeb Benefits

1. Scalability► Scaling SN to benefit 100+ core systems

2. Interconnection Reliability► Handling faults in crossbars and links

3. Process Variation► Slower components can be isolated in a multi-core chip

Page 21: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

21

Mitigating Process Variation

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Severe process variation and lifetime wearout can result in a disparity of health for various resourcesStageNet can effectively isolate strong/weak resources

Ex/MemIssue

Fetch Decode

Fast

Medium

Slow

Fast

Frequency

Page 22: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

22

Evaluation• Open RISC 1200 cores (4-stage in-order)• 12 configurations compared, 64-cores each

• Experiments► Lifetime evaluations - throughput and total work► Process variation - speed binning on a die

SingleSingle + Front/Back

OverlappingOverlapping +

Front/Back

W/O sparesW/ spares

Fault-tolerant

Interconnections Crossbar types

Page 23: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

23

Lifetime Reliability Evaluations• Monte Carlo simulation with 300+ lifetime experiments

• Where, each lifetime experiment involves -► Assigning a time-to-failure to all stages► Killing components at their failure times► Reconfiguring system to isolate broken components► Repeating this until no logical pipeline can be formed

• Cumulative work and throughput are recorded► Number of cores: 64► Technology node: 90 nm

Page 24: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

24

Cumulative Work

Xbar (w/o spare) Xbar (w/ spare) Fault-Tolerant Xbar0.80.9

11.11.21.31.41.51.61.71.8

Single Xbar Single + F/B Xbar Overlap Xbar Overlap + F/B Xbar

Nor

mal

ized

Cum

ulat

ive

Wor

k ~70% more work!

Page 25: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

25

Cumulative Work (area neutral)

Xbar (w/o spare) Xbar (w/ spare) Fault-Tolerant Xbar0.8

0.9

1

1.1

1.2

1.3

1.4

1.5

Single Xbar Single + F/B Xbar Overlap Xbar Overlap + F/B Xbar

Nor

mal

ized

Cum

ulat

ive

Wor

k 52 cores

Best StageWeb Configuration

• Overlapping interconnection network• 52 cores• 6 adjacent slices connected by each crossbar• Fault-tolerant crossbars

Page 26: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

26

Throughput over time

0.000000.100000.200000.300000.400000.500000.600000.700000.800000.900001.000001.100001.200001.300001.400001.500001.600001.700001.800001.900002.000002.100002.200002.300002.400002.500002.600002.700002.800002.900003.000003.100003.200003.300003.400003.500003.600003.700003.800003.900004.000004.100004.200004.300004.400004.500004.600004.700004.800004.900005.000005.100005.200005.300005.400005.500005.600005.700005.800005.900006.000006.100006.200006.300006.400006.500006.600006.700006.800006.900007.000007.100007.200007.300007.400007.500007.600007.700007.800007.900008.000008.100008.200008.300008.400008.500008.600008.700008.800008.900009.000009.100009.200009.300009.400009.500009.600009.700009.800009.9000010.0000010.1000010.2000010.3000010.4000010.5000010.6000010.7000010.8000010.9000011.0000011.1000011.2000011.3000011.4000011.5000011.6000011.7000011.8000011.9000012.0000012.1000012.2000012.3000012.4000012.5000012.6000012.7000012.8000012.9000013.0000013.1000013.2000013.3000013.4000013.5000013.6000013.7000013.8000013.9000014.0000014.1000014.2000014.3000014.4000014.5000014.6000014.7000014.8000014.9000015.0000015.1000015.2000015.3000015.4000015.5000015.6000015.7000015.8000015.9000016.0000016.1000016.2000016.3000016.4000016.5000016.6000016.7000016.8000016.9000017.0000017.1000017.2000017.3000017.4000017.5000017.6000017.7000017.8000017.9000018.0000018.1000018.2000018.3000018.4000018.5000018.6000018.7000018.8000018.9000019.0000019.1000019.2000019.3000019.4000019.5000019.6000019.7000019.8000019.9000020.0000020.1000020.2000020.3000020.4000020.5000020.6000020.7000020.8000020.9000021.0000021.1000021.2000021.3000021.4000021.5000021.6000021.7000021.8000021.9000022.0000022.1000022.2000022.3000022.4000022.5000022.6000022.7000022.8000022.9000023.0000023.1000023.2000023.3000023.4000023.5000023.6000023.7000023.8000023.9000024.0000024.1000024.2000024.3000024.4000024.5000024.6000024.7000024.8000024.9000025.0000025.1000025.2000025.3000025.4000025.5000025.6000025.7000025.8000025.9000026.0000026.1000026.2000026.3000026.4000026.5000026.6000026.7000026.8000026.9000027.0000027.1000027.2000027.3000027.4000027.5000027.6000027.7000027.8000027.9000028.0000028.1000028.2000028.3000028.4000028.5000028.6000028.7000028.8000028.9000029.0000029.1000029.2000029.3000029.4000029.5000029.6000029.7000029.8000029.900000

10

20

30

40

50

60

CMP StageWeb StageWeb (area neutral)

Time (in years)

Peak

Thr

ough

put (

IPC

)

Page 27: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

27

Mitigating Process Variation

0.730

0000

0000

0001

0.760

0000

0000

0001 0.7

9

0.820

0000

0000

0001

0.850

0000

0000

0001 0.8

80.9

1

0.940

0000

0000

0001 0.9

7

0.999

9999

9999

9999

0

4

8

12

16

Traditional CMP StageWeb CMP

Frequency (normalized)

Num

ber o

f cor

es

Freq

27

45For a given frequency target, StageWeb can operate:1. More cores, OR2. Same # of cores at lower voltage

Page 28: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

28

Conclusions• Architectural innovations will be crucial in tackling

technological uncertainties

• StageWeb is a potential solution► Allows fine-grained isolation of failures► Most reliability gains from grouping 8-10 pipelines► Scalable to 100+ cores

• StageWeb can also mitigate process variation by grouping together faster and slower parts

Page 29: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

29

Thank You

http://cccp.eecs.umich.edu

Page 30: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

30

Back up slides

Page 31: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

31

Impact of Defects on CMP Yield

Page 32: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

32

Overlapping Network

Page 33: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

33

Simple + 2nd Level Crossbars

Page 34: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

34

Overlapping + 2nd Level Crossbar

Page 35: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

35

Back-endFront-end

Interconnection Alternatives

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Fetch Ex/MemDecode Issue

Isla

nd 1

Isla

nd 2

1. Connectivitya) Simpleb) Simple + Front-Backc) Overlapd) Overlap + Front-Back

2. Reliability

Inpu

ts

Outputs

a) crossbar

Inpu

ts

Outputs

b) crossbar with spares

Inpu

tsO

utpu

ts

c) fault-tolerant crossbar

Page 36: StageWeb : Interweaving Pipeline Stages into a  Wearout  and Variation Tolerant CMP Fabric

University of MichiganAdvanced Computer Architecture Laboratory

36

SN System Level IssuesF D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

F D E/MI

1. Crossbars don’t scale well due to wiring / layout complexity - Area - Delay - Power

2. Interconnection prone to failures - Single point of failure - Links have no redundancy