asynchronous pipelines
Post on 13-Jan-2016
97 Views
Preview:
DESCRIPTION
TRANSCRIPT
Asynchronous PipelinesAsynchronous Pipelines
Author: Peter YehAuthor: Peter Yeh
Advisor: Professor BeerelAdvisor: Professor Beerel
USC Asynchronous Group 2
MotivationMotivation
• Can we reduce asynchronous pipelines Can we reduce asynchronous pipelines communication overhead while hiding communication overhead while hiding precharge time?precharge time?
• Can we have cycle time in Can we have cycle time in asynchronous pipelines as fast, if not asynchronous pipelines as fast, if not faster, than best synchronous faster, than best synchronous counterparts.counterparts.
USC Asynchronous Group 3
Motivation: System Motivation: System PerformancePerformance• Fixed stage pipelineFixed stage pipeline
– Low pipeline usage: Low latency is criticalLow pipeline usage: Low latency is critical
– High pipeline usage: Cycle time is the High pipeline usage: Cycle time is the limiting factor to generate new outputs as limiting factor to generate new outputs as fast as possiblefast as possible
• Flexible stage pipelineFlexible stage pipeline– With zero forward overhead and short cycle With zero forward overhead and short cycle
time, we can achieve a given desired time, we can achieve a given desired throughput with fewer stagesthroughput with fewer stages
USC Asynchronous Group 4
Motivation: System Motivation: System PerformancePerformance• Pipelines with loop dependenciesPipelines with loop dependencies
– Optimal cycle time is the sum of latency Optimal cycle time is the sum of latency around the looparound the loop
– Pipelining is required to ensure Pipelining is required to ensure precharge/reset is not in the critical pathprecharge/reset is not in the critical path
– Our scheme requires less pipeline stages to Our scheme requires less pipeline stages to achieve same performanceachieve same performance
USC Asynchronous Group 5
IntroductionIntroduction
• Asynchronous pipeline schemes using Asynchronous pipeline schemes using Taken Detector (TD)Taken Detector (TD)
• Best use in coarse-grained pipelinesBest use in coarse-grained pipelines
• Two schemes targeting different Two schemes targeting different requirements (a possible third SI requirements (a possible third SI scheme as well)scheme as well)
USC Asynchronous Group 6
OutlineOutline• Background reviewBackground review
– SutherlandSutherland
– Ted WilliamTed William
– RenaudinRenaudin
– MartinMartin
• Taken pipelineTaken pipeline
• Performance comparisonPerformance comparison
• ConclusionConclusion
USC Asynchronous Group 7
DefinitionDefinition
• Stage: A collection of logic that is Stage: A collection of logic that is precharged or evaluated at the same precharged or evaluated at the same timetime
• Cycle: The time it takes for a stage to Cycle: The time it takes for a stage to start next evaluation from the current start next evaluation from the current oneone
• Forward Latency: The time it takes Forward Latency: The time it takes between the start of the evaluation of between the start of the evaluation of current stage to next stagecurrent stage to next stage
USC Asynchronous Group 8
Background OutlineBackground Outline
• Sutherland’s Micropipeline schemeSutherland’s Micropipeline scheme
• Ted William’s PS0 and PC0 pipeline Ted William’s PS0 and PC0 pipeline schemesschemes
• Renaudin’s DCVSL pipeline schemeRenaudin’s DCVSL pipeline scheme
• Martin’s deep pipeline schemeMartin’s deep pipeline scheme
USC Asynchronous Group 9
Sutherland’s MicropipelineSutherland’s Micropipeline
• Father of Asynchronous Pipeline. Presented Father of Asynchronous Pipeline. Presented in Turing Award lecturein Turing Award lecture
• Delay InsensitiveDelay Insensitive
C
Cd
Pd
P
REG
C
Cd
Pd
P
REG
LOGIC
C
Cd
Pd
P
REG
C
Cd
Pd
P
REG
LOGIC
C
Cd
Pd
P
REG
C
Cd
Pd
P
REG
LOGIC
c
c
c
R(in)
A(in)
D(in)
A(out)
R(out)
D(out)
USC Asynchronous Group 10
William’s PC0William’s PC0• Speed IndependentSpeed Independent
• Cycle Time (Cycle Time (PP) = 3) = 3tF tF +1+1tF tF +4+4tCtC+4+4tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF+1+1tDtD+1+1tCtC
PrechargedFunction
BlockF1
PrechargedFunction
BlockF3
PrechargedFunction
BlockF3
D1
C1 C2 C3
D2 D3
D(in)
R(in)
A(in)A(out)
R(out)
PrechargedFunction
BlockF1
PrechargedFunction
BlockF3
PrechargedFunction
BlockF1
PrechargedFunction
BlockF3
PrechargedFunction
BlockF2
D(out)
USC Asynchronous Group 11
PC0 Timing DiagramPC0 Timing Diagram
F1 F2 F3F1 (evaluation)
D1 (completed) C2 F2 (evaluation)
C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)
D1 (Preharged) C2 D2 (completed)F2 (precharge)
C1 D2 (Preharged) C3 F1 (evaluation) F3 (precharge)
D1 (completed) C2 D3 (Preharged)F2 (evaluation)
C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)
D1 (Preharged) C2 D2 (completed)F2 (precharge)
C1 D2 (Preharged) C3 F3 (precharge)
D3 (Preharged)Time
• The cycle time is shown in read arrows while The cycle time is shown in read arrows while the blue arrows show the precharge phasethe blue arrows show the precharge phase
USC Asynchronous Group 12
Dependency GraphDependency Graph
C2 F2 C3 F3 C4 F4
D2 D2 D2
C1 F1 C2 F2 C3 F3
D1 D2 D3
C F D
C F D
0 0
00
+1
+1
+1
+1
-1
-1Folded Dependency
Graph
Flat DependencyGraph
USC Asynchronous Group 13
William’s PC1William’s PC1
• Cycle Time (Cycle Time (PP) = 2) = 2tF tF +4+4tCtC+4+4tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF+2+2tCtC+1+1tDtD
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
DA
C1 C2
DB D2
D(in)
R(in)
A(in)A(out)
R(out)
D(out)
CLatch
USC Asynchronous Group 14
William’s PS0William’s PS0• Not Speed IndependentNot Speed Independent
• Cycle Time (Cycle Time (PP) = 3) = 3tF tF +1+1tF tF +2+2tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
PrechargedFunction
BlockF3
D1 D2 D3
D(in)
A(in)
A(out)
D(out)
USC Asynchronous Group 15
PS0 Timing DiagramPS0 Timing Diagram
F1 F2 F3F1 (evaluation)
D1 (complete evaluation) F2 (evaluation)
F1 (precharge) D2 (complete evaluation) F3 (evaluation)
D1 (precharged) F2 (precharge) D3 (complete evaluation)
F1 (evaluation) D2 (precharged) F3 (precharge)
D1 (complete evaluation) F2 (evaluation) D3 (precharged)
F1 (precharge) D2 (complete evaluation) F3 (evaluation)
D1 (precharged) F2 (precharge) D3 (complete evaluation)
D2 (precharged) F3 (precharge)
D3 (precharged)
Time
USC Asynchronous Group 16
PS0 Timing AssumptionPS0 Timing Assumption• The pipeline has to meet the following The pipeline has to meet the following
timing assoumptiontiming assoumption
1122 iiiii tDtFtDtFtFF1 F2 F3F1 (evaluation)
D1 (evaluated) F2 (evaluation)
F1 (precharge) D2 (evaluated) F3 (evaluation)
D1 (precharged) F2 (precharge) D3 (evaluated)
F1 (evaluation) D2 (precharged) F3 (precharge)
D1 (evaluated) F2 (evaluation) D3 (precharged)
F1 (precharge) D2 (evaluated) F3 (evaluation)
D1 (precharged) F2 (precharge) D3 (evaluated)
D2 (precharged) F3 (precharge)
D3 (precharged)Time
tF
1122 iiii tDtFtDtF
USC Asynchronous Group 17
Renaudin’s DCVSL PipelineRenaudin’s DCVSL Pipeline
• Compare to Ted’s PC0 onlyCompare to Ted’s PC0 only
• Use DCVSL exclusivelyUse DCVSL exclusively
• Introduce Latched DCVSLIntroduce Latched DCVSL
• Improve cycle time but not forward latencyImprove cycle time but not forward latency
• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF + 4+ 4tC tC +2+2tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF + 1+ 1tC tC +1+1tDtD
USC Asynchronous Group 18
DCVS Logic FamilyDCVS Logic Family
ReqOut Out
c ina
a
b
Req
c ina
a
b
A ck
Req
Out
Req
DCVSL TreeA ck
In
In
Out
DCVS Logic Latched DCVS Logic
USC Asynchronous Group 19
More on DCVSLMore on DCVSL• AdvantageAdvantage
– Fast, based on the dynamic domino type logicFast, based on the dynamic domino type logic
– Build-in Four-Phase handshakingBuild-in Four-Phase handshaking
– Robust completion sensingRobust completion sensing
– Storage elementStorage element
• DisadvantageDisadvantage
– Higher Complexity - increase in number of Higher Complexity - increase in number of transistors and areatransistors and area
– Higher Power dissipationHigher Power dissipation
USC Asynchronous Group 20
DCVS PipelineDCVS Pipeline
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
PrechargedFunction
BlockF3
D1
C1 C2 C3
D2 D3
D(in)
R(in)
A(in)
A(out)
R(out)
D(out)
• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF +4+4tC tC +2+2tDtD
(2(2tFtF +4+4tC tC +2+2tDtD ) )
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF +1+1tC tC +1+1tDtD
USC Asynchronous Group 21
DCVS Pipeline Timing DCVS Pipeline Timing DiagramDiagram
F1 F2 F3F1 (evaluation)
D1 (completed) C2 F2 (evaluation)
C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)
D1 (Preharged) C2 F2 (precharge) D2 (completed)
C1 D2 (Preharged) C3 F1 (evaluation) F3 (precharge)
D3 (Preharged)D1 (completed) C2
F2 (evaluation)
C1 D2 (completed) C3 F1 (precharge) F3 (evaluation)
D1 (Preharged) C2 F2 (precharge) D2 (completed)
C1 D2 (Preharged) C3 F3 (precharge)
D3 (Preharged)Time
USC Asynchronous Group 22
DCVS Dependency GraphDCVS Dependency Graph
C F D
C F D
0 0
00
+1
+1
+1
+1
-1
-1 Folded DependencyGraph
• Cycle Time (Cycle Time (PP) = 1) = 1tFtF +1+1tFtF +4+4tC tC +2+2tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF +1+1tC tC +1+1tDtD
USC Asynchronous Group 23
Martin’s Pipeline SchemesMartin’s Pipeline Schemes
• Deep pipeliningDeep pipelining
• Quasi Delay-Insensitive (QDI)Quasi Delay-Insensitive (QDI)No timing No timing assumptionassumption
• Based on different handshaking Based on different handshaking reshufflingreshuffling
• Best scheme has high concurrency which Best scheme has high concurrency which reduce control overheadreduce control overhead
• Control logic is more complexControl logic is more complex
USC Asynchronous Group 24
Basic Asynchronous Basic Asynchronous HandshakingHandshaking
ee
ee
RRRRRxRx
LLLLxLxLFB
; ,; ;
;; ; ;
101100
101100
2L0
L1
Le Re
R0
R1
1L0
L1
Le Re
R0
R1
3L0
L1
Le Re
R0
R1
• Reshuffling eliminates the explicit variable Reshuffling eliminates the explicit variable xx• Large control overheadLarge control overhead
L1
LeLe
L1 R1R1
ReRe
USC Asynchronous Group 25
Handshaking ReshufflingHandshaking Reshuffling
ee
ee
LRRLLR
LRLRLRHB
; ,; ;
;; ;
1010
1100
• Still wait for predecessor to reset before Still wait for predecessor to reset before resetting itselfresetting itselflarger overhead for more inputslarger overhead for more inputs
2L0
L1
Le Re
R0
R1
1L0
L1
Le Re
R0
R1
3L0
L1
Le Re
R0
R1
L1
LeLe
L1 R1R1
ReRe
USC Asynchronous Group 26
Precharge-Logic Half-BufferPrecharge-Logic Half-Buffer
• Doesn’t wait for the predecessor to reset Doesn’t wait for the predecessor to reset before it resets its outputs. Yet, the control before it resets its outputs. Yet, the control logic wait for the reset of the predecessor logic wait for the reset of the predecessor only after current stage has resetonly after current stage has reset
ee
ee
LLLRRR
LRLRLRPCHB
; ; ,;
;; ;
1010
1100
2L0
L1
Le Re
R0
R1
1L0
L1
Le Re
R0
R1
3L0
L1
Le Re
R0
R1
L1
LeLe
L1 R1R1
ReRe
USC Asynchronous Group 27
Precharge-Logic Full-BufferPrecharge-Logic Full-Buffer
• Allows the neutrality test of the output Allows the neutrality test of the output data to overlap with raising the left enablesdata to overlap with raising the left enables
• Complex control logic, requires extra state Complex control logic, requires extra state variablevariable
enLLLRRR
enLRLRLRPCFB
ee
ee
; ; , ,;
; ; ;
1010
1100
2L0
L1
Le Re
R0
R1
1L0
L1
Le Re
R0
R1
3L0
L1
Le Re
R0
R1
L1
LeLe
L1 R1R1
ReRe
enen
USC Asynchronous Group 28
Martin’s PCHB Full-adderMartin’s PCHB Full-adderen
A1
Se
A1S0S1
A0 A0C1 C1
C0 C0B1 B0
Se
en
S0S1
en
De
D0D1
De
en
C1 C0
B1B0
A1 A0
B1 B0
D0D1
enA0
A1
B1
B0
C0
C1
Av
Bv
Cv
S0
S1
D0
D1
ABCv
Dv
Le
C C
USC Asynchronous Group 29
Martin’s Pipeline in GeneralMartin’s Pipeline in General
• The Cycle time is limited by the The Cycle time is limited by the properties of QDIproperties of QDI– Next stage has to Next stage has to finishfinish precharge before precharge before
the current stage can evaluate next inputthe current stage can evaluate next input
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
PrechargedFunction
BlockF3
D1 D2 D3
D(in)
D(out)
Control Control ControlLe
Le
Re
USC Asynchronous Group 30
Performance Analysis on Performance Analysis on PCFBPCFB• Control logic can be seen as completion Control logic can be seen as completion
detection (D) plus C-element (C)detection (D) plus C-element (C)
• Reshuffling of handshaking just changes the Reshuffling of handshaking just changes the degree of the concurrency but it doesn’t degree of the concurrency but it doesn’t affect the best case performance analysisaffect the best case performance analysis
• Cycle Time (Cycle Time (PP) = 3) = 3tFtF +1+1tFtF +2+2tC tC +2+2tDtD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF
USC Asynchronous Group 31
OutlineOutline• Background reviewBackground review
– SutherlandSutherland
– Ted WilliamTed William
– RenaudinRenaudin
– MartinMartin
• Taken pipelineTaken pipeline
• Performance comparisonPerformance comparison
• ConclusionConclusion
USC Asynchronous Group 32
Taken PipelineTaken Pipeline
• Use of Taken DetectorUse of Taken Detector
• Two schemes to satisfy different Two schemes to satisfy different requirementsrequirements
• Both are not speed independent Both are not speed independent
USC Asynchronous Group 33
Initial IdeaInitial Idea
• Precharge: only when next stage has taken Precharge: only when next stage has taken the current resultthe current result
• Evaluation: only when next stage has Evaluation: only when next stage has prechargedprecharged
• Similar idea to Martin’s pipeline schemesSimilar idea to Martin’s pipeline schemes
USC Asynchronous Group 34
Further ObservationFurther Observation
• PrechargePrecharge– We can precharge the current stage as We can precharge the current stage as
soon as the first level logic of next stage soon as the first level logic of next stage has evaluatedhas evaluatednext stage has taken the next stage has taken the resultresult
• EvaluateEvaluate– Evaluation can be started as soon as the Evaluation can be started as soon as the
guarded N-transistor in the first level logic guarded N-transistor in the first level logic of next stage has turned offof next stage has turned off
USC Asynchronous Group 35
Relax Precharge (RP) Relax Precharge (RP) ConstraintConstraint• Current stage can precharge as soon as Current stage can precharge as soon as
the first level logic of next stage has the first level logic of next stage has evaluated: Next stage has Taken the resultevaluated: Next stage has Taken the result
• Current stage can evaluate as soon as the Current stage can evaluate as soon as the first level logic of next stage has first level logic of next stage has precharged, blocking the new result from precharged, blocking the new result from passing throughpassing through
• No need for extra control logic except TD No need for extra control logic except TD which is similar to completion detectorwhich is similar to completion detector
USC Asynchronous Group 36
RP Pipeline SchemeRP Pipeline Scheme
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
PrechargedFunction
BlockF3
TD1 TD2 TD3
D(in) D(out)
• Cycle Time (Cycle Time (PP) = 2) = 2tFtF + 1+ 1tF1tF1 +1+1tF1tF1 +2+2tTDtTD
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF
USC Asynchronous Group 37
RP Timing DiagramRP Timing Diagram
F1 F2 F3F1 (evaluation)
T1 (taken) F2 (evaluation)
F1 (precharge) T2 (taken) F3 (evaluation)
T1 (precharged) F2 (precharge) T3 (taken)
F1 (evaluation) T2 (precharged) F3 (precharge)
T1 (taken) F2 (evaluation) T3 (precharged)
F1 (precharge) T2 (taken) F3 (evaluation)
T1 (precharged) F2 (precharge) T3 (taken)
T2 (precharged) F3 (precharge)
T3 (precharged)Time
USC Asynchronous Group 38
RP Timing AssumptionRP Timing Assumption• Easy to meet timing assumptionEasy to meet timing assumption
112211 112 iiiiiii tTDtFtTDtFtFtFtTD
F1 F2 F3F1 (evaluation)
T1 (taken) F2 (evaluation)
F1 (precharge) T2 (taken) F3 (evaluation)
T1 (precharged) F2 (precharge) T3 (taken)
F1 (evaluation) T2 (precharged) F3 (precharge)
T1 (taken) F2 (evaluation) T3 (precharged)
F1 (precharge) T2 (taken) F3 (evaluation)
T1 (precharged) F2 (precharge) T3 (taken)
T2 (precharged) F3 (precharge)
T3 (precharged)
Time
ii tFtTD 1
11221 112 iiiii tTDtFtTDtFtF
USC Asynchronous Group 39
RP Timing Assumption Cont.RP Timing Assumption Cont.
• tF1tF1ii is the first level logic of stage is the first level logic of stage ii
• tF2tF2ii is the logic after the first level of is the logic after the first level of stage stage ii
• Assuming rising and falling of TD is the Assuming rising and falling of TD is the samesame
USC Asynchronous Group 40
Relax Evaluation (RE) Relax Evaluation (RE) ConstraintConstraint• Current stage can start the evaluation Current stage can start the evaluation
about the same time as the next stage about the same time as the next stage turns off the guarded N-transistors in the turns off the guarded N-transistors in the first level logicfirst level logic
• Requires general C-element, yet improve Requires general C-element, yet improve cycle timecycle time
USC Asynchronous Group 41
RE Pipeline SchemeRE Pipeline Scheme• TD can be skewed for fast evaluation TD can be skewed for fast evaluation
detectiondetection
• Cycle Time (Cycle Time (PP) = 2) = 2tFtF + 1+ 1tF1tF1 +1+1tTDtTD +1 +1tCtC
• Forward Latency (Forward Latency (LLff) = 1) = 1tFtF
PrechargedFunction
BlockF1
PrechargedFunction
BlockF2
PrechargedFunction
BlockF3
TD1 TD2 TD3
D(in) D(out)
GC1
+
GC1 GC1
+ +
USC Asynchronous Group 42
RE Timing DiagramRE Timing DiagramF1 F2 F3F1 (evaluation)
T1 (taken) F2 (evaluation)
C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)
C2 T3 (precharged)T1 (taken) F2 (evaluation)
C3C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3T2 (precharged) F3 (precharge)
T3 (precharged)Time
USC Asynchronous Group 43
RE Timing Assumption 1RE Timing Assumption 1
• Precharge constraintPrecharge constraint
iiiiiii tCtTDtFtFtFtCtTD 2211 12F1 F2 F3F1 (evaluation)
T1 (taken) F2 (evaluation)
C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)
C2 T3 (precharged)T1 (taken) F2 (evaluation)
C3C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3T2 (precharged) F3 (precharge)
T3 (precharged)
Time
iii tFtCtTD 1
iiii tCtTDtFtF 221 12
USC Asynchronous Group 44
RE Timing Assumption 2RE Timing Assumption 2
11 1 iiii tFtFCtFtC
• Evaluation constraint (Min Delay)Evaluation constraint (Min Delay)
F1 F2 F3F1 (evaluation)
T1 (taken) F2 (evaluation)
C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3C1 T2 (precharged) F3 (precharge)F1 (evaluation)
C2 T3 (precharged)T1 (taken) F2 (evaluation)
C3C1 T2 (taken) F3 (evaluation)F1 (precharge)
C2 T3 (taken)T1 (precharged) F2 (precharge)
C3T2 (precharged) F3 (precharge)
T3 (precharged)
Time
ii tFtC
11 1ii tFtFC
USC Asynchronous Group 45
Issue in Fine-Grained Issue in Fine-Grained PipelinesPipelines• In a fine-grained pipeline, such as Martin’s In a fine-grained pipeline, such as Martin’s
single gate pipeline, RE scheme may single gate pipeline, RE scheme may require buffering due to process variationrequire buffering due to process variation– Buffering is necessary because of second Buffering is necessary because of second
timing assumption, next gate (stage) may not timing assumption, next gate (stage) may not have turned off N-stack before the result from have turned off N-stack before the result from current stage reaches itcurrent stage reaches it
11 1 iiii tFtFCtFtC
USC Asynchronous Group 46
Taken Detector (TD)Taken Detector (TD)
• Similar to Completion DetectorSimilar to Completion Detector
• Detect both evaluation and prechargeDetect both evaluation and precharge
• Inputs are the output of first level logic Inputs are the output of first level logic of each stageof each stage
USC Asynchronous Group 47
Datapath Merging & SplittingDatapath Merging & Splitting
• Datapath merging and splitting can be Datapath merging and splitting can be done similar to William’s styledone similar to William’s style
PrechargedFunction
BlockF2a
PrechargedFunction
BlockF3
TD2a
TD3
D(out)PrechargedFunction
BlockF2b
PrechargedFunction
BlockF1
TD1
TD2b
C
D(in)
USC Asynchronous Group 48
OutlineOutline• Background reviewBackground review
– SutherlandSutherland
– Ted WilliamTed William
– RenaudinRenaudin
– MartinMartin
• Taken pipelineTaken pipeline
• Performance comparisonPerformance comparison
• ConclusionsConclusions
USC Asynchronous Group 49
Comparison of RE and Comparison of RE and Synchronous Skew Tolerant Synchronous Skew Tolerant • Assuming 4 stages pipeline, stage 1-4, Assuming 4 stages pipeline, stage 1-4,
and 4 phases clockingand 4 phases clocking
• Synchronous:Synchronous:
– Stage 1 starts next evaluation after stage 4 Stage 1 starts next evaluation after stage 4 starts evaluationstarts evaluation
• Asynchronous:Asynchronous:
– Stage 1 starts next evaluation after we Stage 1 starts next evaluation after we detect the completion of the first level logic detect the completion of the first level logic of stage 3of stage 3
USC Asynchronous Group 50
Comparison AssumptionsComparison Assumptions
• It is a balanced pipeline—all stages It is a balanced pipeline—all stages have equal evaluation timehave equal evaluation time
• Precharge time is same as evaluation Precharge time is same as evaluation timetime
USC Asynchronous Group 51
Graphical ComparisonGraphical ComparisonStage 1
1
Stage 22
Stage 33
Stage 44
USC Asynchronous Group 52
Optimum Number of StagesOptimum Number of Stages
• Optimum Number of Stages (ONS)Optimum Number of Stages (ONS)
• Cycle Time is not the only factor in system Cycle Time is not the only factor in system performance, Forward Latency is also a performance, Forward Latency is also a limiting factor limiting factor
• Larger cycle time can be compensated by Larger cycle time can be compensated by increasing the number of stagesincreasing the number of stages
• However, high However, high LLff means system throughput means system throughput can not be increased by adding more can not be increased by adding more stagesstages
fL
PONS
USC Asynchronous Group 53
Conclusion Conclusion
• With Taken logic and some easy to With Taken logic and some easy to meet timing requirement, we can meet timing requirement, we can achieve the best cycle time and forward achieve the best cycle time and forward latencylatency
• The performance comparison with The performance comparison with existing pipeline schemes are favorableexisting pipeline schemes are favorable
• Implementation is still required to prove Implementation is still required to prove the theorythe theory
top related