worst-case execution time analysis · 8-bit hitachi h8/300 32 kb rom, 32 kb ram standard...
Post on 19-Aug-2018
225 Views
Preview:
TRANSCRIPT
WorstWorst--CCaseaseExecution Time Execution Time
AnalysisAnalysisAndreas Andreas ErmedahlErmedahl, PhD, PhD
Mälardalens RealMälardalens Real--Time Research CenterTime Research Centerandreas.ermedahlandreas.ermedahl@@mdhmdh.se.se
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 2
RealReal--TimeTimeandand
EmbeddedEmbedded
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 3
Embedded SystemsEmbedded Systems
““A computer that doesn’t A computer that doesn’t look like a computer”look like a computer”Interacts with worldInteracts with worldPrimitive or no user interfacePrimitive or no user interfacePart of other productsPart of other products
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 4
"Desktop"2%
"Embedded"98%
Embedded SystemsEmbedded Systems
The vast majority of processors!The vast majority of processors!200 million PC and server200 million PC and server8000 million embedded8000 million embedded
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 5
Processor MarketProcessor Market
Processors: Processors: 50% of total 50% of total semiconductor revenuesemiconductor revenueExplains why everyone Explains why everyone wants to do processorswants to do processors
Simple processors Simple processors dominate in units dominate in units 3232--bit dominant in bit dominant in CPU revenueCPU revenue
30% of total 30% of total semiconductor revenuesemiconductor revenue
PC processors: PC processors: 50% of CPU revenue50% of CPU revenueAMD and Intel share itAMD and Intel share it
32-bit16-bit
8-bit
4-bit
DSP
32-bit
16-bit8-bit4-bitDSP
0%10%20%30%40%50%60%70%80%90%
100%
Units Money
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 6
Embedded SystemsEmbedded Systems
Single purpose productsSingle purpose productsNot general purpose like desktop PCsNot general purpose like desktop PCsDo Do one thingone thing very efficientlyvery efficiently
Software very important:Software very important:Gives character to productGives character to product
Used to differentiate inside a “platform”Used to differentiate inside a “platform”Can be changed lateCan be changed lateProcessor cheaper than special HWProcessor cheaper than special HWTToday, dominates dev costoday, dominates dev cost
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 7
Simple Embedded Simple Embedded SystemsSystems
8-bit Hitachi H8/30032 kB ROM, 32 kB RAM
Standard microcontroller chip
Byte-code machine, sensor drivers, …
8-bit Intel 8051, standard microcontroller
Behavior, talk, IR communications
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 8
Consumer ElectronicsConsumer Electronics
Heterogeneous Heterogeneous multiprocessormultiprocessor
88--bit Atmel AVR for UI, games, …bit Atmel AVR for UI, games, …1616--bit fixedbit fixed--point TI C54 DSP for point TI C54 DSP for GSM coding, radio interface, … GSM coding, radio interface, … 3232--bit ARM7 in Bluetooth modulebit ARM7 in Bluetooth module+ maybe ARM7 in IRDA interface+ maybe ARM7 in IRDA interface
All in custom chipsAll in custom chipsSoftware is large:Software is large:
16 MB of code in control part16 MB of code in control partPlus signal processing codePlus signal processing code
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 9
AutomAutomotiveotive
Multiple networksMultiple networksCAN for body CAN for body electronics: 30+ nodeselectronics: 30+ nodesCAN for engine control: CAN for engine control: few nodesfew nodesLIN for instrumentsLIN for instruments
Many processorsMany processorsUp to 100Up to 100
Large diversity in processor types:Large diversity in processor types:88--bit CPUs (PIC, HC08) for door locks, lights, etc. bit CPUs (PIC, HC08) for door locks, lights, etc. 1616--bit CPUs (C167, HC11, HC12) for most functionsbit CPUs (C167, HC11, HC12) for most functions3232--bit CPUs (PPC,V850) for engine control, airbagsbit CPUs (PPC,V850) for engine control, airbags
Total amount of code: 40Total amount of code: 40--50 MB50 MB
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 10
Automotive: Cost Automotive: Cost ssensitiveensitive
““Software is now a major part of automotive Software is now a major part of automotive development time”development time”
Design, programming, testingDesign, programming, testing
Use cheapest possible HWUse cheapest possible HW(There are exceptions, of course)(There are exceptions, of course)Fast Fast enough enough is fast is fast enoughenoughToo Too fast is a fast is a waste waste of of moneymoney
Small savings add upSmall savings add upSave 50¢ per unitSave 50¢ per unitShip 1 million unitsShip 1 million unitsSaving: $500,000! Pure profit!Saving: $500,000! Pure profit!
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 11
Embedded programmingEmbedded programming
Programming languages used:Programming languages used:C, C++, assembler, Java, AdaC, C++, assembler, Java, AdaBest support is for CBest support is for CC++ C++ quite quite commoncommon
Fragmented compiler marketFragmented compiler market88--bit & 16bit & 16--bitbit
IAR Systems, IAR Systems, KeilKeil, Cosmic, Tasking, Cosmic, Tasking3232--bit: ARM bit: ARM
IAR, ARM, IAR, ARM, KeilKeil3232--bit: MIPS, PPC, 68k, x86: bit: MIPS, PPC, 68k, x86:
WindRiverWindRiver, , GreenHillsGreenHills, , MetroWerksMetroWerksDSPDSP
Almost solely inAlmost solely in--house compilershouse compilersTI, Motorola, Lucent, IntelTI, Motorola, Lucent, Intel
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 12
RealReal--Time SystemTime System
Timing as important as resultTiming as important as resultHard realHard real--time:time:
Hard deadlinesHard deadlinesDead if missed deadlineDead if missed deadlineWorstWorst--casecase
Soft realSoft real--time:time:Fuzzier deadlinesFuzzier deadlinesCan miss some deadlinesCan miss some deadlinesAverageAverage--casecase
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 13
Conceptual ConfusionConceptual Confusion
Embedded and RealEmbedded and Real--TimeTimeSynonymous?Synonymous?
Most embedded Most embedded systems are systems are realreal--timetimeMost realMost real--time time systems are systems are embeddedembedded
embeddedembedded
realreal--timetime
embedded embedded realreal--timetime
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 14
Embedded Embedded vs Realvs Real--TimeTime
RealReal--time a time a systemsystem issueissueHolistic, timing, distributedHolistic, timing, distributed
Embedded a Embedded a programmingprogramming issueissueHandle HW, resources, …Handle HW, resources, …But with timing in mind!But with timing in mind!
HardwareHardware
EmbeddedEmbedded
RealReal--TimeTime
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 15
Embedded systems: summaryEmbedded systems: summary
Large variety of processorsLarge variety of processorsLarge variety of support tools, Large variety of support tools, compilers, linkers etc.compilers, linkers etc.Software a key component Software a key component Used in a large variety of Used in a large variety of applications applications
Many RealMany Real--TimeTimeUsage increasesUsage increases
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 16
WorstWorst--Case Case Execution Execution
TimeTime
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 17
Definition of WCETDefinition of WCET
WCETWCET = Worst = Worst CCasease EExecutionxecution TTimeimeOther measuresOther measures: :
Best Best casecase executionexecution time time –– BCETBCETAvarage case executionAvarage case execution time time –– ACETACET
0
safe BCETestimates
safe WCETestimates
actual BCET
actual WCET
possible execution times
Are you sure to findthe worst case execution?
time
prob
abilit
y
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 18
WCET AssumptionsWCET Assumptions
WCET analysis assume:WCET analysis assume:One specific program run in One specific program run in isolationisolationNo interfering background No interfering background activitiesactivitiesNo task switches or interruptsNo task switches or interruptsRunning on a certain hardwareRunning on a certain hardware
Task interference:Task interference:Scheduling / analysis issueScheduling / analysis issue
void foo(int j, int a[]){
int i;for(i=100, i>0; i--){
if(j>50)a[i] = 2 * i;
elsea[i] = i;
}}
?
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 19
Uses of WCETUses of WCETHard realHard real--time systemstime systems
Guarantee behavior in all circumstancesGuarantee behavior in all circumstances
Soft realSoft real--time systemstime systemsNo hard timing requirements No hard timing requirements Useful for understanding systemUseful for understanding system
SchedulingSchedulingCreating schedulesCreating schedulesVerifying schedulesVerifying schedules
Interrupt latency checkingInterrupt latency checkingDoes system always react quickly enough?Does system always react quickly enough?
Program tuningProgram tuningCritical loopsCritical loopsCritical pathsCritical paths
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 20
Obtaining Obtaining WCET EstimatesWCET Estimates
MeasurementMeasurementIndustrial Industrial practicepracticeAdd safety marginAdd safety margin
Static analysisStatic analysisResearch frontResearch frontTheoretically safeTheoretically safe
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 21
Measuring Measuring the WCETthe WCET
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 22
How How to to measuremeasure WCETWCET
MethodologyMethodology::Determine Determine ””worstworst--case case inputinput””Run Run and and measuremeasureAdd Add a a safety marginsafety margin
Problems:Problems:Have you really found the worst case?Have you really found the worst case?InteractionInteraction with the rest of the with the rest of the system?system?HowHow precise is the precise is the clockclock usedused??
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 23
Measured Measured WCET: ProblemWCET: Problem
NeverNever overestimateoverestimate the WCETthe WCETHope to Hope to find find the the worst caseworst case
0
safe BCETestimates
safe WCETestimates
actual BCET
actual WCET
possible execution times
Will never Will never measure a measure a value in the value in the
safe areasafe area
Measurements Measurements will result in will result in numbers in numbers in this regionthis region
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 24
Measuring TimeMeasuring Time
(In(In--circuit) Emulators (ICE)circuit) Emulators (ICE)Special debug version of the CPU Special debug version of the CPU Dying breed: Modern processors being too fast and too Dying breed: Modern processors being too fast and too complex to emulatecomplex to emulate
Processors with debug supportProcessors with debug supportUse a few dedicated pins, e.g. JTAG, BDMUse a few dedicated pins, e.g. JTAG, BDM
Looking for signals on busLooking for signals on busUsing oscilloscopeUsing oscilloscope(flip bit in important loop(flip bit in important loop))Logic Analyzer Logic Analyzer
Using simulatorsUsing simulatorsCorrectness vs. hardware?Correctness vs. hardware?
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 25
Measuring Measuring TimeTime
LoopLoop task and task and measuremeasureHigh waterHigh water--markingmarking::
Keep Keep system system runningrunningRecord Record max max execution execution time time observedobservedCommon Common feature in RTOSfeature in RTOSKeep Keep in shipping systems, in shipping systems, read @ service intervalsread @ service intervals
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 26
Static Static WCET WCET
AnalysisAnalysis
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 27
StaticStatic WCETWCET AnalysisAnalysis
Do not run the program Do not run the program –– analyze itanalyze itGuaranteedGuaranteed safesafe WCETWCETTryingTrying to be as tight asto be as tight as possiblepossible
ProvidedProvided all input isall input is correctcorrect
0
safe BCETestimates
safe WCETestimates
actual BCET
actual WCET
possible execution times
All estimates All estimates will be in the will be in the
safe areasafe area
Will never give Will never give a result in this a result in this
regionregion
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 28
Causes of ExecutionCauses of ExecutionTimeTime VariationVariation
Execution characteristicsExecution characteristicsofof the program the program
A program A program can often executecan often executeinin severalseveral different different wayswaysInputInput data data dependendependenciesciesApplication characteristicsApplication characteristics
Timing Timing characteristics characteristics of the hardware of the hardware
Cache memoriesCache memoriesPipelinesPipelines......
int foo(int max){
int i, j, total;i = 0;j = 1;while(i =< max){
if (j < 5) j++;
if (j > max)break;
total = total + j - 2;i++;
}return total;
}
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 29
Static WCET AnalysisStatic WCET Analysis
Flow analysisFlow analysisDetermine the dynamic Determine the dynamic behavior of programbehavior of program
Low level analysisLow level analysisDetermine execution time Determine execution time for program parts on the for program parts on the hardwarehardware
CalculationCalculationCombine flow and lowCombine flow and low--level level times to give a WCET times to give a WCET estimateestimate
Compiler
ObjectCode
Target Hardware
Reality
program
ActualWCET
Low levelanalysis
Calculation
Flow analysis
Analysis
WCETEstimate
ActualWCET
WCETEstimate
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 30
Flow AnalysisFlow Analysis
Dynamic behaviour of programDynamic behaviour of programExample of needed info:Example of needed info:
Number of loop iterationsNumber of loop iterationsRecursion depthRecursion depthInput dependenciesInput dependenciesInfeasible pathsInfeasible pathsFunction instancesFunction instances
Provided by static analysis or Provided by static analysis or manual annotationsmanual annotations
Flow analysis
Low levelanalysis
Calculation
Program
WCETEstimate
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 31Assembler code
C source code
The basic block graphThe basic block graph
int foo(int max){
int i, j, total;i = 0;j = 1;while(i =< max){
if(j < 5) j++;
if(j > max)break;
total = total + j - 2;i++;
}return total;
}
Flows Flows as as
edgesedges
Each Each block will block will run as a run as a
unitunit
foo:mov r0,r6movi #1,r7mov r0,r5br foo_1
foo_0:add r7,r5addi #-2,r5addi #1,r6
foo_1:cmp r6,r1blt foo_5
foo_2:cmpi #5,r6bge foo_4
foo_3:addi #1,r7
foo_4:cmp r7,r1bge foo_0
foo_5:mov r5,r1jmp [31]
foo_0:
foo_1:
foo_2:
foo_3:
foo_5:
foo:
Basic block graph
foo_4:
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 32
Example program
Flow Info CharacteristicsFlow Info Characteristics
Basic finiteness
Statically allowed
Actual feasiblepaths
Structurally possibleflows (infinite)
Relation between possible executions and flow info
loopbound loopbound = 10= 10
#foo_3#foo_3 <=<= 55
WCET found here = WCET found here = desired resultdesired result
WCET found here =WCET found here =overestimationoverestimation
int foo(int max){
int i, j, total;i = 0;j = 1;while(i =< max){
if(j < 5) j++;
if(j > max)break;
total = total + j - 2;i++;
}return total;
}
foo_0:
foo_1:
foo_2:
foo_3:
foo_5:
foo:
Basic block graph
foo_4:
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 33
ExampleExample: Loop : Loop BoundsBounds
Loop Loop boundsbounds: : 100 in this 100 in this exampleexampleIn general, a In general, a very very difficult difficult problemproblemSolvable Solvable for for most most loops, loops, howeverhoweverStick to Stick to writing writing simple loopssimple loops
Basic Basic finitenessfiniteness
foo(xfoo(x): ): AA:: loop(i=loop(i=1..1001..100) )
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) then
FF:: b[i] = a[i];b[i] = a[i];endend
GG:: bar (i)bar (i)end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 34
Example:Example: Infeasible PathInfeasible Path
Infeasible pathInfeasible path::A, B, C, E, F, GA, B, C, E, F, GSinceSince C C ¬F¬FDue Due to data:to data:ifif (x > 5) (x > 5) then then it is it is notnot possible possible that that ((x*2) < ((x*2) < 0)0)
foo(xfoo(x): ): AA:: loop(i=1..100)loop(i=1..100)BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 35
ExampleExample: : Triangular Triangular LoopLoop
Loops:Loops:Loop A Loop A boundbound: 100 : 100 Local Local B B boundbound: 100: 100
Block C: Block C: By loop By loop boundsbounds::100 * 100 = 10100 * 100 = 10 000000But actuallyBut actually::100+...+1 = 5100+...+1 = 5 050050
triangle(a,b): triangle(a,b): AA: : loop(i=1..100) loop(i=1..100)
BB: : loop(j=loop(j=ii..100)..100)CC:: a[i,j]=... a[i,j]=...
end loopend loopend loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 36
The mapping problemThe mapping problemFlow analysis easier on source codeFlow analysis easier on source code
Semantic of code clearerSemantic of code clearerEasier for programmer to give flow informationEasier for programmer to give flow information
LowLow--level analysis made on objectlevel analysis made on object--code code The code that the processor really executesThe code that the processor really executes
Compiler optimizations can change Compiler optimizations can change code structurecode structure
For example, loops can be removed or added For example, loops can be removed or added Hard to identify where flow information should be Hard to identify where flow information should be givengiven ...
for(i=0; i <= 100; i++){
if(a[i] > 10)...
else...
}...
…011001010010100110010111010100101001010100111010101001010101010010011010011010101001010101010101....
Loopbound: 101
Where is the loop?
C source code Object code
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 37
FlowFlow AnalysisAnalysis
State of the art in research:State of the art in research:Often using programmer annotationsOften using programmer annotationsAnalysis at assembler code levelAnalysis at assembler code levelBounds for simpleBounds for simpler loops can be found r loops can be found Not too much pointer magic allowedNot too much pointer magic allowedWeWell ll structured code often assumedstructured code often assumed
Active research project at Mdh!Active research project at Mdh!Goal: Automatic derivation of flow Goal: Automatic derivation of flow information for C programsinformation for C programs
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 38
LowLow--levellevelanalysisanalysis
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 39
LowLow--Level AnalysisLevel Analysis
Determine execution time for Determine execution time for program partsprogram parts
Account for hardware effectsAccount for hardware effectsUsing Using a model of the target CPUa model of the target CPU
Work on object codeWork on object codeThe program that really executes The program that really executes
Two main issues:Two main issues:Cache analysis (global)Cache analysis (global)Pipeline analysis (local)Pipeline analysis (local)
Analysis complexityAnalysis complexity dependdependss on on target CPUtarget CPU complexitycomplexity
Safe approximations sometimes needed Safe approximations sometimes needed
Flow analysis
Program
Low levelanalysis
Calculation
WCETEstimate
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 40
Pipeline AnalysisPipeline Analysis
PipelinesPipelinesOverlap Overlap instructionsinstructions
Variants:Variants:NoneNone: Traditional CPUs : Traditional CPUs (68HC11, 8051)(68HC11, 8051)ScalarScalar: Single pipeline : Single pipeline (ARM, SH3, V850, 68040)(ARM, SH3, V850, 68040)SuperscalarSuperscalar: Multiple pipelines, dynamic : Multiple pipelines, dynamic (PowerPC 7xx, Pentium, (PowerPC 7xx, Pentium, UltraSPARCUltraSPARC, MIPS20k), MIPS20k)VLIWVLIW: Multiple pipelines, compiler scheduling : Multiple pipelines, compiler scheduling (DSP, Itanium, Crusoe)(DSP, Itanium, Crusoe)
IFIDEX
MEMWB
1 2 3 5 6 74 8 9 10 11
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 41
Pipeline AnalysisPipeline Analysis
Local analysisLocal analysisInteraction only with neighbor instructionsInteraction only with neighbor instructionsContrast with cache analysisContrast with cache analysis
Cache analysis results as inputCache analysis results as inputCache behavior can affect pipeliningCache behavior can affect pipelining
Or integrated cache/pipelineOr integrated cache/pipelineRequired for more complex hardwareRequired for more complex hardware
Analysis for nonAnalysis for non--pipelined CPU:pipelined CPU:Assign each instruction a fixed timeAssign each instruction a fixed timeSum across basic blocksSum across basic blocks
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 42
ExampleExample: No Pipeline: No Pipelinefoo(xfoo(x): ):
AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) then
FF:: b[i] = a[i];b[i] = a[i];endend
GG:: bar (i)bar (i)end loopend loop
((77 cyclescycles))((55 cc))(1(122 cc))
(2 (2 cc))
(4 (4 cc))(8 (8 cc))
(20 (20 cc))
ConstantConstant time time for for each each block block in the in the codecodeObject codeObject code is is notnot shownshown
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 43
ExampleExample: No Pipeline: No Pipelinefoo(xfoo(x): ):
AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) then
FF:: b[i] = a[i];b[i] = a[i];endend
GG:: bar (i)bar (i)end loopend loop
Foo()
C
A
B
D
E
F
G
end
ttAA=7=7
ttDD=2=2
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=20=20
Basic block graph
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 44
A
B
Example: Simple PipelineExample: Simple Pipeline
ttAA = 7= 7
BIFEXEXMF
1 2 3 4 5ttBB = 5= 5
IFEXEXMF
1 2 3 4 5 6 7A IF
EXEXMF
1 2 3 4 5 6 7foo(xfoo(x): ): AA:: loop(i=1..100)loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) then
FF:: b[i] = a[i];b[i] = a[i];endend
GG:: bar (i)bar (i)end loopend loop
1 2 3 4 5 6 7 8 9IFEXEXMF
10
ttABAB = 10= 10
δδAB AB = 10 = 10 --(7 + 5) = (7 + 5) = --22
δδABAB = = --22
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 45
ExampleExample: : Pipeline resultPipeline resultfoo(xfoo(x): ):
AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endendEE:: if (x < 0) thenif (x < 0) then
FF:: b[i] = a[i];b[i] = a[i];endend
GG:: bar (i)bar (i)end loopend loop
Foo()
C
A
B
D
E
F
G
end
ttAA=7=7
ttDD=2=2
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=20=20
δδABAB==--22
δδBCBC==--22 δδBDBD==--11
δδDEDE==--22δδCECE==--11
δδEFEF==--22
δδFGFG==--22
δδEGEG==--11
δδGAGA==--11
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 46
IFEXMF
IFEXMF
Pipeline InteractionsPipeline Interactions
IFEXMF
IFEXMFIF
EXMF
IFEXMF
IFEXMF
IFEXMF
PairwisePairwise overlap: speedoverlap: speed--up up that we want to account forthat we want to account for
Interaction across Interaction across three blocks!three blocks!
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 47
Pipeline AnalysisPipeline Analysis
OutOut--ofof--order pipelinesorder pipelinesVery difficult analysis problemVery difficult analysis problemTrack all possible pipeline states, Track all possible pipeline states, iterate until fixed pointiterate until fixed point
Interaction with Interaction with icacheicache & & dcachedcacheIntegrated cache/pipeline analysis necessaryIntegrated cache/pipeline analysis necessaryBranch prediction affects Branch prediction affects icacheicache statestate
Been done for PowerPC 755Been done for PowerPC 755Up to 1000 states per instruction!Up to 1000 states per instruction!
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 48
Cache AnalysisCache Analysis
Cache memories:Cache memories:Increase speed onIncrease speed on averageaverageMore variable execution timesMore variable execution timesCommonCommon onon highhigh--speed CPUsspeed CPUs
Unified cachesUnified cachesInstructions & data in oneInstructions & data in one
Split cachesSplit cachesSeparate instructions & dataSeparate instructions & data
CacheMemory
Main memory
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 49
Cache AnalysisCache Analysis
Performed globallyPerformed globallyCannot be analyzed locally Cannot be analyzed locally
Instruction cachesInstruction cachesPredictable from instruction flowPredictable from instruction flow
Data cachesData cachesNo simple way to predict accessesNo simple way to predict accessesVery difficult analysis problemVery difficult analysis problem
Unified cachesUnified cachesHigh degree of pHigh degree of pessimismessimism
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 50
Example: Cache AnalysisExample: Cache Analysisfib:fib:
movmov #1, r5#1, r5movmov #0, r6#0, r6movmov #2, r7#2, r7brbr fib_0fib_0
fib_1:fib_1:movmov r5,r8r5,r8addadd r6,r5r6,r5movmov r8,r6r8,r6addadd #1,r7#1,r7
fib_0:fib_0:cmpcmp r7,r1r7,r1bgebge fib_1fib_1
fib_2:fib_2:movmov r5,r1r5,r1jmpjmp [r31][r31]
Performed on Performed on object codeobject codeOnly instruction Only instruction cache in this cache in this exampleexample
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 51
Example: Cache AnalysisExample: Cache Analysisfib:fib:
movmov #1, r5 #1, r5 2 10002 1000movmov #0, r6 #0, r6 2 10022 1002movmov #2, r7 #2, r7 2 10042 1004brbr fib_0 fib_0 2 10062 1006
fib_1:fib_1:movmov r5,r8 r5,r8 2 10082 1008addadd r6,r5 r6,r5 2 10102 1010movmov r8,r6 r8,r6 2 10122 1012addadd #1,r7 #1,r7 2 10142 1014
fib_0:fib_0:cmpcmp r7,r1 r7,r1 2 10162 1016bgebge fib_1 fib_1 2 10182 1018
fib_2: fib_2: movmov r5,r1 r5,r1 2 10202 1020jmpjmp [r31] [r31] 2 10222 1022
Starting Starting addressaddress
Size of Size of instructioninstruction
Information Information needed for needed for instruction instruction cache cache analysisanalysis
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 52
Example: Cache AnalysisExample: Cache Analysisfib:fib:
movmov #1, r5 #1, r5 2 10002 1000movmov #0, r6 #0, r6 2 10022 1002movmov #2, r7 #2, r7 2 10042 1004brbr fib_0 fib_0 2 10062 1006
fib_1:fib_1:movmov r5,r8 r5,r8 2 10082 1008addadd r6,r5 r6,r5 2 10102 1010movmov r8,r6 r8,r6 2 10122 1012addadd #1,r7 #1,r7 2 10142 1014
fib_0:fib_0:cmpcmp r7,r1 r7,r1 2 10162 1016bgebge fib_1 fib_1 2 10182 1018
fib_2: fib_2: movmov r5,r1 r5,r1 2 10202 1020jmpjmp [r31] [r31] 2 10222 1022
Mapping to Mapping to instruction instruction cachecache
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 53
Example: Cache AnalysisExample: Cache Analysisfib:fib:
movmov #1, r5#1, r5movmov #0, r6#0, r6movmov #2, r7#2, r7brbr fib_0fib_0
fib_1:fib_1:movmov r5,r8r5,r8addadd r6,r5r6,r5movmov r8,r6r8,r6addadd #1,r7#1,r7
fib_0:fib_0:cmpcmp r7,r1r7,r1bgebge fib_1fib_1
fib_2: fib_2: movmov r5,r1r5,r1jmpjmp [r31][r31]
missmisshithithithithithit
missmisshithit
missmisshithithithithithit
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 54
Example: Cache AnalysisExample: Cache Analysisfib:fib:
movmov #1, r5#1, r5movmov #0, r6#0, r6movmov #2, r7#2, r7brbr fib_0fib_0
fib_1:fib_1:movmov r5,r8r5,r8addadd r6,r5r6,r5movmov r8,r6r8,r6addadd #1,r7#1,r7
fib_0:fib_0:cmpcmp r7,r1r7,r1bgebge fib_1fib_1
fib_2: fib_2: movmov r5,r1r5,r1jmpjmp [r31][r31]
missmisshithithithithithit
missmisshithit
missmisshithithithithithit
hithithithit
hithithithithithithithit
Remaining Remaining iterationsiterations
First First iteration of iteration of
the loopthe loop
hithithithit
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 55
CalculationCalculation
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 56
CalculationCalculation
Find the path through that gives Find the path through that gives the longest execution timethe longest execution timeSeveral approaches used:Several approaches used:
TreeTree--basedbasedPathPath--basedbasedConstraintConstraint--based (IPET)based (IPET)
Properties of approachesProperties of approachesProgram flow allowedProgram flow allowedObject code structure (optimizations)Object code structure (optimizations)Pipeline effect modelingPipeline effect modelingSolution complexitySolution complexity
Flow analysis
Program
Low levelanalysis
Calculation
WCETEstimate
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 57
TreeTree--Based CalculationBased Calculation
Loop
foo
Header if(x>5)
x=x/2 x=x+2
if(x<0)
b[i]=a[i]
bar(i)
Use syntaxUse syntax--tree tree of programof programTraverse tree Traverse tree bottombottom--upup
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 58
TreeTree--Based CalculationBased Calculation
Loop : 100
()
foo
()
Header
(7)
if(x>5)
(5)
x=x/2
(12)
x=x+2
(2)
if(x<0)
(4)
b[i]=a[i]
(8)
bar(i)
(20)
Use constant Use constant time for time for nodesnodesLeaf Leaf nodes have nodes have definite timedefinite timeRulesRules for for internalsinternals
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5)if (x > 5) thenthenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
((77 cc))((55 cc))(1(122 cc))
(2 (2 cc))
(4 (4 cc))(8 (8 cc))
(20 (20 cc))
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 59
For a For a decision decision statementstatement: max : max of of childrenchildrenAdd Add time for time for decisiondecisionitselfitself
TreeTree--BasedBased: IF : IF statementstatement
Loop : 100
()
foo
()
Header
(7)
if(x>5)
(5) ∑ 17
x=x/2
(12)
x=x+2
(2)
if(x<0)
(4) ∑ 12
b[i]=a[i]
(8)
bar(i)
(20)
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 60
TreeTree--BasedBased: LOOP: LOOP
Loop: Loop: sum sum the the childrenchildrenMultiply Multiply by loop by loop boundbound
Loop : 100
∑ 56 * 100
foo
()
Header
(7)
if(x>5)
(5) ∑ 17
x=x/2
(12)
x=x+2
(2)
if(x<0)
(4) ∑ 12
b[i]=a[i]
(8)
bar(i)
(20)
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 61
TreeTree--BasedBased: Final : Final resultresult
The The function function foofoo() () will take will take 5600 5600 cycles cycles in in the the worst caseworst case
Loop : 100
∑ 56 * 100
foo
∑ 5600
Header
(7)
if(x>5)
(5) ∑ 17
x=x/2
(12)
x=x+2
(2)
if(x<0)
(4) ∑ 12
b[i]=a[i]
(8)
bar(i)
(20)
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelseDD:: x = x+2x = x+2
endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 62
PathPath--Based CalcBased CalcFoo()
C
B
D
E
F
end
ttAA=7=7
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttGG=20=20
Find longest pathFind longest pathOne loop at a timeOne loop at a time
Prepare the loopPrepare the loopRemove back edgesRemove back edgesRedirect to special Redirect to special continue nodescontinue nodes
A
continue
G
ttDD=2=2
ttFF=8=8
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelse
DD:: x = x+2x = x+2endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 63
PathPath--Based CalculationBased CalculationFoo()
C
B
D
E
F
end
ttAA=7=7
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttGG=20=20
Longest path:Longest path:A, B, C, E, F, GA, B, C, E, F, G7+5+12+4+8+7+5+12+4+8+20= 20= 56 cycles56 cycles
Total time:Total time:100 iterations100 iterations56 cycles per iteration56 cycles per iterationTotal: 5600 cyclesTotal: 5600 cycles
A
continue
G
ttDD=2=2
ttFF=8=8
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 64
PathPath--Based Based CalcCalcFoo()
C
B
D
E
F
end
ttAA=7=7
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttGG=20=20
Infeasible path:Infeasible path:A, B, C, E, F, GA, B, C, E, F, GIgnore, look for nextIgnore, look for next
New longest path:New longest path:A, B, C, E, GA, B, C, E, G48 cycles48 cycles
Total time:Total time:Total: 4800 cyclesTotal: 4800 cycles
A
continue
G
ttDD=2=2
ttFF=8=8
foo(xfoo(x): ): AA:: loop(i=1..100) loop(i=1..100)
BB:: if (x > 5) thenif (x > 5) thenCC: : x = x*2x = x*2
elseelse
DD:: x = x+2x = x+2endend
EE:: if (x < 0) thenif (x < 0) thenFF:: b[i] = a[i];b[i] = a[i];
endendGG:: bar (i)bar (i)
end loopend loop
C and F can never execute
together
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 65
Outer loop
PathPath--Based CalculationBased Calculation
Inner loop
D
B
C
E
J
I
ttAA
ttII
ttHH
ttFF ttGG
ttEE
ttCC ttDD
ttBB
ttJJ
AMultiple levels Multiple levels of loopsof loopsWork bottomWork bottom--up up
Replace analyzed Replace analyzed loops with blocksloops with blocksPerform analysis Perform analysis on next levelon next level GF
H
B
C
E
F
H
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 66
PathPath--Based CalculationBased Calculation
Multiple levels Multiple levels of loopsof loopsWork bottomWork bottom--up up
Replace analyzed Replace analyzed loops with blocksloops with blocksPerform analysis Perform analysis on next levelon next level
Outer loop
Inner loop
D
B
C
E
J
I
ttAA
ttII
ttHH
ttFF ttGG
ttEE
ttCC ttDD
ttBB
ttJJ
A
GF
H
B
C
E
F
H
Inner loop
ttinnerinner
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 67
PathPath--Based CalculationBased Calculation
Multiple levels Multiple levels of loopsof loopsWork bottomWork bottom--up up
Replace analyzed Replace analyzed loops with blocksloops with blocksPerform analysis Perform analysis on next levelon next level
Outer loopJ
I
ttAA
ttII
ttJJ
A
Inner loop
ttinnerinner
J
A
Inner loop
ttinnerinner
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 68
PathPath--Based CalculationBased Calculation
Multiple levels Multiple levels of loopsof loopsWork bottomWork bottom--up up
Replace analyzed Replace analyzed loops with blocksloops with blocksPerform analysis Perform analysis on next levelon next level
Outer loopJ
I
ttAA
ttII
ttJJ
A
Inner loop
ttinnerinner
J
A
Inner loop
ttinnerinner
Outer loop
ttouterouter
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 69
IPET IPET "Implicit path enumeration "Implicit path enumeration technique" technique" Execution paths Execution paths not not explicitly represented explicitly represented
Program model:Program model:Nodes and edges Nodes and edges Execution count (Execution count (xxentityentity))Timing info (Timing info (ttentityentity))
Node times: basic blocks Node times: basic blocks Edge times: overlapEdge times: overlap
foo()
C
A
B
D
E
F
G
end
ttAA=7=7
ttDD=2=2
ttBB=5=5
ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=20=20
Example: IPET CalculationExample: IPET Calculation
XXAA
XXBB
XXCC XXDD
XXEE
XXFF
XXGG
XXGAGA
ttABAB
ttBCBC ttBDBD
XXDEDE
ttEGEG
ttCECE
ttEFEF
ttFGFG
ttfooAfooA
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 70
WCET=WCET=max max ΣΣ((xxentityentity * * ttentityentity))
Where each Where each xxentityentitysatisfies constraintssatisfies constraints
Constraints:Constraints:Start conditionStart conditionProgram structureProgram structureLoop boundsLoop boundsOther flow informationOther flow information
Foo()
C
A
B
D
E
F
G
end
Basic IPET CalculationBasic IPET Calculation
XXAA
XXBB
XXCC XXDD
XXEE
XXFF
XXGG
XXfoofoo=1=1XXGAGA
XXABAB
XXBCBC XXBDBD
XXDEDE
XXEGEG
XXCECE
XXEFEF
XXFGFG
XXfooAfooA
XXABAB=X=XAA
XXEE=X=XCECE+X+XDEDE
XXAA==XXfooAfooA+X+XGAGA
XXBCBC+X+XBDBD=X=XBB
XXAA<=100<=100
XXCC+X+XFF<=X<=XAA
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 71
Solution methods:Solution methods:Integer linear programmingInteger linear programmingConstraint satisfactionConstraint satisfaction
Solution:Solution:Counts for Counts for nodes and edgesnodes and edgesThe value of the WCETThe value of the WCETGlobal analysisGlobal analysis
Foo()
C
A
B
D
E
F
G
end
IPET CalculationIPET Calculation
XXAA=100=100
XXBB=100=100
XXCC=100=100 XXDD=0=0
XXEE=100=100
XXFF=0=0
XXGG=100=100
WCET=4800WCET=4800
XXfoofoo=1=1
XXendend=1=1
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 72
Pros and ConsPros and Cons
TreeTree--based:based:Simple and efficientSimple and efficientCannot handle infeasible pathsCannot handle infeasible paths
PathPath--basedbasedEfficient if implemented rightEfficient if implemented rightCan handle some flow informationCan handle some flow information
IPETIPETPowerful and complex (efficiency=?)Powerful and complex (efficiency=?)Can handle very complex flowsCan handle very complex flows
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 73
Final Final NotesNotes
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 74
Correctness?Correctness?
Flow Analysis:Flow Analysis:Part of “program proof” techniquesPart of “program proof” techniquesSound theoretical techniquesSound theoretical techniques
LowLow--Level Analysis:Level Analysis:Modern hardware difficult to modelModern hardware difficult to model
Combinations of performance featuresCombinations of performance featuresBugs relative hardware specs commonBugs relative hardware specs common
UltrasparcUltrasparc Cache, V850E Pipeline, Errata, … Cache, V850E Pipeline, Errata, … How prove correctness vs. hardware?How prove correctness vs. hardware?How capture all effects?How capture all effects?
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 75
WCET WCET analysis toolsanalysis tools
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 76
WCET ToolsWCET Tools
Several more or less complete tools Several more or less complete tools Two commercial:Two commercial:
AiT from AbsInt (demo tool)AiT from AbsInt (demo tool)BoundBound--T from TidoRumT from TidoRum
Several research Several research prototypes:prototypes:
Sweet Sweet –– Swedish WorstSwedish Worst--Case Case Execution Time tool Execution Time tool Heptane from IrisaHeptane from IrisapWCET from YorkpWCET from York
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 77
WCET ToolsWCET Tools
Tool differences: Tool differences: Supported CPUs Supported CPUs Flow analysis performedFlow analysis performedCalculation method usedCalculation method usedHow mapping problem is solvedHow mapping problem is solved
Decoding binariesDecoding binariesIntegrated with compilerIntegrated with compiler
Examples of supported processors:Examples of supported processors:ARM7TDMI, ARM9, HC(S)12, NEC V850E, ARM7TDMI, ARM9, HC(S)12, NEC V850E, PPC755PPC755,, Motorola Motorola ColdFireColdFire 5307, 5307, SPARCV7, SPARCV7, Intel 8051, ADSP 210202, Intel 8051, ADSP 210202, TriCore1796 TriCore1796
Chalmers, 4 March 2004 Worst-Case Execution Time Analysis 78
WCET Tool WCET Tool DemoDemo
The The End!End!
top related