worst-case execution time analysis...worst-case execution time analysis 2009-12-03 1 worst-case...
TRANSCRIPT
-
Worst-Case Execution Time analysis 2009-12-03
1
Worst-CaseExecution Time
AnalysisAnalysisAndreas Ermedahl, Docent
Mälardalen Real-Time Research Center (MRTC)Västerås, Sweden
C2
What C are we talking about?
A key component in the analysis of real-time systemsYou have seen it in formulas such as:
Worst-Case Worst-Case
3
Ri = Ci + ∑⎡Ri / Tj⎤ Cjj∈hp(i)
Worst CaseResponse Time Period
Where do these C values come from?
Worst CaseExecution Time
Program timing is not trivial!
Simpler questions
int f(int x) {
return 2 * x;
}
Harder questions
What is the program doing?
Will it always do the same thing?
How important is the result?
4
What is the executiontime of the program?
Will it always take the same time to execute?
How important is execution time?
Program timing basicsMost computer programs have varyingexecution time
Due to input valuesDue to software characteristicsDue to hardware characteristics
Example: some timed program runs
5
0 execution time
prog
ram
runs Most runs have
similar execution time
Some take much longer time (why?)
Is this the longest execution time...
... or can we get even longer ones?
WCET and WCET analysisWorst-Case Execution Time = WCET
The longest calculation time possibleFor one program/task when run in isolationOther interesting measures: BCET, ACET
The goal of a WCET analysis is to derive f b d ’ WCET
safe uppertiming
bounds
possible execution
times
a safe upper bound on a program’s WCET
time
safe lowertiming
bounds
0
BCET WCET
-
Worst-Case Execution Time analysis 2009-12-03
2
Presentation outlineEmbedded and Real-TimeWCET analysis
Measurements Static analysis Fl l i l l l l i d l l tiFlow analysis, low-level analysis, and calculationHybrid approaches
WCET analysis toolsThe SWEET approach to WCET analysisUpcoming challengesWCET tool demo
Embedded and
Real-Time
Embedded computersAn integrated part of a larger system
Example: Each microwave oven holds at least one embedded processor Example: A modern car can contain more than 100 embedded processors
Interacts with the use, the environment, and with other computers
Often limited or no user interfaceOften with some timing constraints inputinput resultresult
Embedded systems everywhere
Today, all advanced products contain embedded computers!
Our society is dependant on that they function correctlythat they function correctly
Embedded systems software
Amount of software can vary from extremely small to very large
Gives characteristics to the productOften developed with target hardware in mind
11
Often limited resources (memory / speed)Often direct accesses to different HW devicesNot always easily portable to other HW
Many different programming languagesC still dominates, but often special purpose languages
Many different software development tools Not just GCC and/or Microsoft Visual Studio
Embedded system hardwareHuge variety of embedded system processors
Not just one main processor type as for PCs Additionally, same CPU can be used with various hardware configurations (memories, devices, …) g ( )
The hardware is often tailored specifically to the application
For example using a DSP processor for signal processing
Cross-platform developmentE.g., develop on PC and download final application to target HW
-
Worst-Case Execution Time analysis 2009-12-03
3
A numerical comparisionEmbedded systems processors clearly dominate yearly production
100 million PC processors6000 million embedded
June 10, 2005Timely Software - a Challenge 13
"Desktop"2%
"Embedded"98%
Real-time systemsComputer systems where the timely behavior is a central part of the function
Containing one or more embedded computersBoth soft- and hard real-time, or a mixture…
Timing of radio
Timing of radio communication,
speech recognition,…
Timing of musicplaying from MP3 file
Timing of radio communication, motor
control, rudder and flaps control,…
Timing of network communication, motor control, ABS brakes, anti-slip control,…
Uses of reliable WCET bounds
Hard real-time systems WCET needed to guarantee behavior
Real-time schedulingCreating and verifying schedulesLarge part of RT research assumeLarge part of RT research assume the existence of reliable WCET bounds
Soft real-time systemsWCET useful for system understanding
Program tuningCritical loops and paths
Interrupt latency checking
WCET analysisanalysis
Obtaining WCET bounds
MeasurementIndustrial practice
Static analysis Research front
Timing measurementmeasurement
-
Worst-Case Execution Time analysis 2009-12-03
4
Measuring for the WCET
Methodology:Determine potential”worst-case input”pRun and measureAdd a safety margin
19
Measurement issuesLarge number of potential worst-case inputs
Program state might be part of input
Has the worst-case path really been taken?Often many possible paths through a program
20
Hardware features may interact in unexpected ways
How to monitor the execution? The instrumentation may affect the timingHow much instrumentionoutput can be handled?
LEDs
Buzzer
SW measurement methodsOperating system facilities
Commands such as time, date and clockNote that all OS-based solutions require precise HW timing facilities (and an OS)
Cycle-level simulatorsS f i l i CPU
21
Software simulating CPUCorrectness vs. hardware?
High-water markingKeep system runningRecord maximum time observed for taskKeep in shipping systems, read at service intervals
Using an oscilloscopeCommon equipment for HW debugging
Used to examine electrical output signals of HW
Mainly for observing the voltage or signal waveform on a particular pin
Usually only two to four inputs
22
Usually only two to four inputs
To measure time spent in a routine: 1. Set I/O pin high when entering routine 2. Set the same I/O pin low before exiting 3. Oscilloscope measures the amount of
time that the I/O pin is high 4. This is the time spent in the routine
Using a logic analyzerEquipment designed for troubleshooting digital hardwareHave dozens or even hundreds of inputs
23
hundreds of inputsEach one keeping track on whether the electrical signal it is attached to is currently at logic level 1 or 0Result can be displayed against a timeline Can be programmed to start capturing data at particular input patterns
Target board
HW Debugger
HW measurement toolsIn-circuit emulators (ICE)
Special CPU version revealing internalsHigh visibility & bandwidthSupportive hardware required
Processors with debug support
24
g ppDesigned into processor
Use a few dedicated processor pinsUsing standardized interfaces
Nexus debug interfaces, JTAG, Embedded Trace Macrocell, …
Supportive HW requiredCommon on modern chip
-
Worst-Case Execution Time analysis 2009-12-03
5
Problem of using measurement
Measured value never larger than WCET!
safe upperti i
possible execution
ti
safe lowerti i
BCET WCET
A safety margin must be added!How much is enough?
timingbounds
times
time
timingbounds
0
You can never measure a value > WCET
Measurement will result in a value ≤ WCET
StaticWCET
analysis
Static WCET analysisDo not run the program – analyze it!
Using models based on the static properties of the software and the hardware
Guaranteed reliable WCET boundsProvided all models, input data and analysis methods are correct
Trying to be as tight as possible
safe uppertiming
bounds
possible execution
times
time
safe lowertiming
bounds
0
BCET WCET
All derived bounds will be ≥ WCET
foo(x,i):
while(i < 100)
if (x > 5) then
x = x*2;
else
x = x+2;
d
Again: Causes of Execution Time Variation
Execution characteristics of the software
A program can often execute in many different waysInput data dependencies
end
if (x < 0) then
b[i] = a[i];
end
i = i+1;
end
Application characteristics
Timing characteristics of the hardware
Clock frequencyCPU characteristicsMemories used…
WCET analysis phases
Compiler
program
L l l
Flow analysis
1. Flow analysis Bound the number of times different program parts may be executed (SW analysis)
2. Low-level analysis
AnalysisReality
ObjectCode
Target Hardware
Low levelanalysis
Calculation
Bound the execution time of different program parts(HW analysis)
3. CalculationCombine flow- and low-level analysis results to derive an upper WCET bound
ActualWCET
WCETbound
Flow analysis
30
analysis
-
Worst-Case Execution Time analysis 2009-12-03
6
Flow AnalysisProvides bounds on the number of times different program parts may be executed
Valid for all possible executionsE l f id d i f
Flow analysis
Program
31
Examples of provided info:Bounds of loop iterationsBounds on recursion depthInfeasible paths
Info provided by: Static program analysisManual annotations
Low levelanalysis
Calculation
WCETEstimate
The control-flow graphfoo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
foo()
A
B
Each block will run as a
unitelse
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
Flows as edges
C D
E
F
G
end
Flow info characteristics
Basic finiteness
Statically allowed
Loop bound: 100Structurally possibleexecutions (infinite)
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
foo()
C
B
D
A
Example program
Actual feasiblepaths
#F < 10
Control flow graph Relation between possible executions and flow info
WCET found here = desired result
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
E
F
G
end
WCET found here =overestimation
Example: Loop boundsLoop bound:
Depends on possible values of input variable i
E.g. if 1 ≤ i ≤ 10 holds for input value i then loop bound is 100
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else In general, a very difficult problemHowever, solvable for many types of loops
Requirement for basic finiteness
All loops must be upper bound
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
Example: Infeasible path
Infeasible path:Path A-B-C-E-F-G can not be executedSince C implies ¬F
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
elseIf (x > 5) then it is not possible that (x*2) < 0
Limits statically allowed executions
Might tighten the WCET estimate
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i=i+1;
end
Example: Triangular Loop
Two loops:Loop A bound: 100 Local B bound: 100
Block C: B l b d
triangle(a,b):
A: loop(i=1..100)
B l (j i 100)
36
By loop bounds:100 * 100 = 10 000But actually:100+...+1 = 5 050
Limits statically allowed executions
Might tighten the WCET estimate
B: loop(j=i..100)
C: a[i,j]=...
end loop
end loop
-
Worst-Case Execution Time analysis 2009-12-03
7
Object File
Object File
Obj t Fil
Compiler
Embedded SW Tool Chain
C Source
C Source
C Library
C Runtime
OSC Source
int twice(int a) {int temp;temp = 2 * a;return temp;
}
Affects timing
Affects timingAffects
timingAffects timingAffects
timing
Executable
100011000001111010001100001000001010110000100000101011000001111010101100001000111010111100011001
LinkerObject File
twice:mov ip, spstmfd sp!, {fp,ip,lr,pc}sub fp, ip, #4sub sp, sp, #8str r0, [fp, #-16]ldr r3, [fp, #-16]mov r3, r3, asl #1str r3, [fp, #-20]ldr r3, [fp, #-20]mov r0, r3ldmea fp, {fp,sp,pc}
Start-upC Source
WCET
Affects timing
Affects timing
The SW building toolsThe compiler:
Translates an source code file to an object code fileOnly translates one source code file at the time
Often makes some type of code optimizations Increase execution speed, reduce memory size, …Different optimizations give different object code layouts
38
Different optimizations give different object code layouts
The linker:Combines several object code files into one executable
Places code, global data, stack, etc in different memory partsResolves function calls and jumps between object files
Can also perform some code transformations
Both tools may affect the program timing!
Example: compiling & linking/******************
* File: main.c
*****************/
int foo();
int main() {
return 1 + foo();
Contains object code for main.c
Object code contains an unresolved call to foo
Compiler main.o
39
;
}
/******************
* File: foo.c
*****************/
int foo() {
return 1;
} Contains object code for foo.c
Compiler foo.o
The main.o and foo.o object code files are combined
The call to fooin main has
been resolved
Linker a.exe
Common additional filesC Runtime code:
Whatever needed but not supported by the HW32-bit arithmetic on a 16-bit machineFloating-point arithmetic Complex operations (e.g., modulo, variable-length shifts)
C ith th il
40
Comes with the compiler May have a large footprint
Bigger for simpler machinesTens of bytes of data and tens of kilobytes of code
OS code: In many ES the OS code is linked together with the rest of the object code files to form a single binary image
Common additional files
Startup code: A small piece of assembly code that prepares the way for the execution of software written in a high-level language
For example, setting up the system stackMany ES compilers provide a file named
0 h ldi t t d
41
startup.asm, crt0.s, … holding startup code
C Library code:A full ANSI-C compiler must provide code that implements all ANSI-C functionality
E.g., functions such as printf, memmove, strcpyMany ES compilers only support subset of ANSI-CComes with the compiler (often non-standard)
The mapping problemFlow analysis easier on source code level
Semantics of code clearerEasier for programmer/tool to give flow information
Low-level analysis requires binary codeThe code executed by the processorAll code parts are not available in source code formatp
Compiler optimizations may change structureFor example, loops can be removed or added
...for(i=0; i 10)...
else...
}...
…0111111010010111011001010010100110010111010100101001010100111010101001010101010000011010011010101001010101010101....
Loopbound: 101
Where is the loop?
Source code Executable code
-
Worst-Case Execution Time analysis 2009-12-03
8
Low-level analysisanalysis
Low-Level AnalysisDetermine execution time bounds for program parts
Focus of most WCET-related researchUsing a model of the target HW
Flow analysis
Program
The model does not need to model all HW detailsHowever, it should safely account for all possible HW timing effects
Works on the binary, linked codeThe executable program
Low levelanalysis
Calculation
WCETEstimate
Some HW model detailsMuch effort required to safely model CPU internals
Pipelines, branch predictors, superscalar, out-of-order, …Much effort to safely model memories
Cache memories must be modelled in detailOther types of memories may also affect timingOther types of memories may also affect timing
For complex CPUs many features must be analyzed together
Timing of instructions get very history dependantDeveloping a safe HW timing model troublesome
May take many months (or even years)All things affecting timing must be accounted for
Hardware time variability Simpler 4-, 8- & 16-bit processors (H8300, 8051, …):
Instructions might have varying execution time due to argument valuesVarying data access time due to different memory areasAnalysis rather simple, timing fetched from HW manual
S & ( )Simpler 16- & 32-bit processors, with a (scalar) pipe-line and maybe a cache (ARM7, ARM9, V850E, …):
Instruction timing dependent on previouslyexecuted instructions and accessed data:
State of pipeline and cacheVarying access times due to cache hits and missesVarying pipeline overlap between instructionsHardware features can be analyzed in isolation
Hardware time variabilityAdvanced 32- & 64-bit processors (PowerPC 7xx, Pentium, UltraSPARC, ARM11, …):
Many performance enhancing features affect timingPipelines, out-of-order exec, branch pred., caches, speculative exec. I t ti ti i t hi t d d tInstruction timing gets very history dependent
Some processors suffer from timing anomaliesE.g., a cache miss might give shorter overall program execution time than a cache hit
Features and their timing interact Most features must be analyzed together
Hard to create a correct and safe hardware timing model!
The memory hierarchy
Cache memory
Caches store frequently used
i t ti d d t
Fasteraccess
time
CPUCaches increase
average speed, but give more variable
execution time
The CPU execute instructions. It also need to access data to perform
operations upon
Larger storage capacity
Main memory
memoryinstructions and data (for faster access)
Main memory has larger storage capacity but much longer access
time than caches
Many variants exists: instruction caches,
data caches, unified caches,
cache hierarchies, …
-
Worst-Case Execution Time analysis 2009-12-03
9
Example: Cache analysisfib:
mov #1, r5mov #0, r6mov #2, r7br fib_0
fib_1:
What instructions will cause cache misses?
Cache misses takes much more time than cache hits!
Main memor
Cache memory
CPU
49
mov r5,r8add r6,r5mov r8,r6add #1,r7
fib_0:cmp r7,r1bge fib_1
fib_2:mov r5,r1jmp [r31]
Performed on the object codeOnly direct-mapped instruction cache in this example
Main memory
Example: Cache analysis
fib:mov #1, r5 2 1000mov #0, r6 2 1002mov #2, r7 2 1004br fib_0 2 1006
fib_1:
Starting address
Size of instruction
Information needed for i t ti
50
mov r5,r8 2 1008add r6,r5 2 1010mov r8,r6 2 1012add #1,r7 2 1014
fib_0:cmp r7,r1 2 1016bge fib_1 2 1018
fib_2: mov r5,r1 2 1020jmp [r31] 2 1022
instruction cache analysis
Example: Cache analysisfib:
mov #1, r5 2 1000mov #0, r6 2 1002mov #2, r7 2 1004br fib_0 2 1006
fib_1:
51
mov r5,r8 2 1008add r6,r5 2 1010mov r8,r6 2 1012add #1,r7 2 1014
fib_0:cmp r7,r1 2 1016bge fib_1 2 1018
fib_2: mov r5,r1 2 1020jmp [r31] 2 1022
Mapping to instruction cache
Example: Cache analysisfib:
mov #1, r5mov #0, r6mov #2, r7br fib_0
fib_1:
misshithithit
52
mov r5,r8add r6,r5mov r8,r6add #1,r7
fib_0:cmp r7,r1bge fib_1
fib_2: mov r5,r1jmp [r31]
misshit
misshithithit
Example: Cache analysisfib:
mov #1, r5mov #0, r6mov #2, r7br fib_0
fib_1:
misshithithitFirst
iteration
53
mov r5,r8add r6,r5mov r8,r6add #1,r7
fib_0:cmp r7,r1bge fib_1
fib_2: mov r5,r1jmp [r31]
misshit
misshithithit
hithit
hithithithit
Remaining iterations
of the loop
hithit
Example: CPU pipelinesObservation: Most instructions go through same stages in the CPUExample: Classic RISC 5-stage pipeline
IF ID EX MEM WB
54
IF ID EX MEM WB
Instruction fetch (IF)Get the next instruction
from memory to process (its address is held by PC)
Instruction decodeDetermine operation to be
performed (i.e., extract opcode and arguments)
ExecutePerform the actual
operation (e.g, an add)
Memory accessLoad/store values from/to memory if
needed
Write backWrite the result into the target register
-
Worst-Case Execution Time analysis 2009-12-03
10
CPU pipelinesIdea: Overlap the CPU stages of the instructions to achieve speed-upNo pipelining:
Next instruction cannot start before
IFIDEX
1 2 3 5 6 74 8 9 10
55
cannot start before previous one has finished all its stages
Pipelining:In principle: Speed-up = length of pipelineHowever, often dependenciesbetween instructions
IFIDEX
MEMWB
1 2 3 5 64
EXMEMWB
Pipeline VariantsNone: Simple CPUs (68HC11, 8051, …)Scalar: Single pipeline (ARM7,ARM9,V850, …)VLIW: Multiple pipelines, static, compiler scheduled (DSPs, Itanium, Crusoe, …)
56
( , , , )Superscalar: Multiple pipelines, out-of-order (PowerPC 7xx, Pentium, UltraSPARC, ...)
IFIDEX
MEMWB
1 2 3 5 6 74 8 9 10 11 Blue instruction occupies EX stage for 2 extra cycles
This stalls both red and green instructions
Example: No Pipelinefoo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
((77 cyclescycles))((55 cc))(1(122 cc)) Constant time
for each block
57
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
(2 (2 cc))
(4 (4 cc))(8 (8 cc))
(2 (2 cc))
in the code
Object code not shown
Example: No pipeline
foo()
A
B
ttAA=7=7
ttBB=5=5
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
C D
E
F
G
end
ttDD=2=2ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=2=2
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
A
B
Example: Simple Pipeline
ttA A = 7= 7
B
IF
1 2 3 4 5ttB B = 5= 5
IF
EXEX
M
F
1 2 3 4 5 6 7
A IFEXEX
M
F
1 2 3 4 5 6 7
δδABAB = = --22
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
EXEX
M
F
1 2 3 4 5 6 7 8 9
IF
EXEX
M
F
10
ttABAB = 10= 10
δδAB AB = 10 = 10 -- (7 + 5) = (7 + 5) = --22
Example: Pipeline result
ttAA=7=7
ttBB=5=5δδABAB==--22
δδBCBC==--22 δδBDBD==--11
δδGAGA==--11
foo()
A
B
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
60
ttDD=2=2ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=2=2
BCBC BDBD
δδDEDE==--22δδCECE==--11
δδEFEF==--22
δδFGFG==--11
δδEGEG==--11
C D
E
F
G
end
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
-
Worst-Case Execution Time analysis 2009-12-03
11
IFEXMF
Pipeline Interactions
IFEXMFIF IF
IFEXMF
Pairwise overlap: speed-up
61
IFEXMF
IFEXMF
FIFEXMF
IFEXMF
F
IFEXMF
Interaction across more than two blocks also possible!Can be both speed-up or slow-down
Cache & Pipeline analysis
foo_0:
foo_1:
foo:
info
info
Pipeline analysis might take cache analysis results as input
Instructions gets annotated ith h hit/ i
62
1032: cmp r6,r1
1034: blt foo_5foo_2:
foo_3:
foo_5:
foo_4:
info
info
info
info
info
with cache hit/missThese misses/hits affect pipeline timing
Complex HW requireintegrated cache & pipeline analysis
1020:icache miss
1022:icache hit
Analysis of complex CPUsExample: Out-of-order pipelines
Several instructions executes in parallel in unitsFunctional units often replicatedDynamic scheduling of instructionsDo not need to follow issuing order g
Very difficult analysis problemTrack all possible pipeline states, iterate until fixed pointRequires integrated pipeline/icache/dcache/branch-prediction analysis
Been done for PowerPC 755Up to 1000 states per instruction!
Low-level analysis correctness?
Abstract model of the hardware is usedModern hardware often very complex
Combines many features Pipelining, caches, branch prediction, out-of-order
64
out-of-order...Have all effects been accounted for?
Manufactures keep hardware internals secretBugs in hardware manuals Bugs relative hardware specifications
Calculation
CalculationDerive an upper bound on the program’s WCET
Given flow and timing informationSeveral approaches used:
Tree based
Flow analysis
Program
Tree-basedPath-basedConstraint-based (IPET)
Properties of approaches:Flow information handledObject code structure allowedModeling of hardware timingSolution complexity
Low levelanalysis
Estimatecalculation
WCETEstimate
-
Worst-Case Execution Time analysis 2009-12-03
12
Example: Combined flow analysis and low-level analysis result
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
foo()
A
B
ttAA=7=7
ttBB=5=5
”Loop bound: 100”
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
C D
E
F
G
end
ttDD=2=2ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=2=2
”C and F can’t be taken together”
Path-Based Calcfoo()
B
tA=7
tB=5
A
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
C D
E
F
end
tC=12
tG=2
Find longest pathOne loop at a time
Prepare the loopRemove back edgesRedirect to special continue nodes
continue
G
tD=2
tF=8
end
tE=4
Path-Based Calculationfoo()
B
tA=7
tB=5
Longest path:A-B-C-E-F-G7+5+12+4+8+2= 38 cycles
A
C D
E
F
end
tC=12
tE=4
tG=2
Total time:100 iterations38 cycles per iterationTotal: 3800 cycles
continue
G
tD=2
tF=8
Path-Based Calcfoo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
C and F can never execute
together
foo()
B
tA=7
tB=5
A
Infeasible path:A-B-C-E-F-GIgnore, look for next
end
C D
E
F
end
tC=12
tE=4
tG=2
continue
G
tD=2
tF=8
Path-Based Calcfoo()
B
tA=7
tB=5
A
foo(x,i):
A: while(i < 100)
B: if (x > 5) then
C: x = x*2;
else
D: x = x+2;
end
E: if (x < 0) then
F: b[i] = a[i];
end
G: i = i+1;
end
C and F can never execute
together
C D
E
F
end
tC=12
tE=4
tG=2
Infeasible path:A-B-C-E-F-GIgnore, look for next
New longest path:A-B-C-E-G30 cycles
Total time:Total: 3000 cycles
continue
G
tD=2
tF=8
end
IPET = Implicit path enumeration technique
Execution paths not explicitly represented
foo()
A
B
ttAA=7=7
tt 22
ttBB=5=5
tt 1212
Example: IPET Calculation
XXGAGA
XXABAB
XXBCBC XXBDBD
XXfooAfooA
XXAA
XXBB
XXfoofoo
p y p
Program model:Nodes and edges Timing info (ttentityentity)
Node times: basic blocks Edge times: overlap
Execution count (xxentityentity)
C D
E
F
G
end
ttDD=2=2ttCC=12=12
ttEE=4=4
ttFF=8=8
ttGG=2=2
BDBD
XXDEDE
XXEGEG
XXCECE
XXEFEF
XXFGFG
XXCC XXDD
XXEE
XXFF
XXGGXXendend
-
Worst-Case Execution Time analysis 2009-12-03
13
WCET=max Σ(xxentityentity * ttentityentity)
Where each xxentityentitysatisfies constraints
foo()
A
B
IPET Calculation
XXAA
XXBB
XXfoofoo=1=1
XXendend=1=1XXGAGA
XXABAB
XXBCBC XXBDBD
XXfooAfooA
XXABAB+X+XAendAend=X=XAA
XXAA=X=XfooAfooA+X+XGAGA
XXAA
XXBB
XXfoofoo
satisfies constraints
Constraints:Start & end conditionProgram structureLoop boundsOther flow information
C D
E
F
G
end
XXCC XXDD
XXEE
XXFF
XXGG
BDBD
XXDEDE
XXEGEG
XXCECE
XXEFEF
XXFGFG
ABAB AendAend AA
XXEE=X=XCECE+X+XDEDE
XXBCBC+X+XBDBD=X=XBB
XXAA
-
Worst-Case Execution Time analysis 2009-12-03
14
WCET analysis y
tools
WCET Analysis ToolsSeveral more or less complete toolsCommercial tools:
aiT from AbsIntBound-T from TidoRumRapiTime from
80
pRapita Systems
Research tools:SWEET – Swedish Execution Time tool Heptane from IrisaFlorida state universitySymTA/P from
TU Braunschweig
WCET tool differencesUsed static and/or hybrid methodsUser interface
Graphical and/or textualFlow analysis performed
Manual annotations supported
81
ppHow the mapping problem is solved
Decoding binariesIntegrated with compiler
Supported processors and compilersLow-level analysis performed
Type of hardware features handledCalculation method used
Supported CPUs (2008)Tool Hardware platformsaiT Motorola PowerPC MPC 555, 565, and 755, Motorola ColdFire MCF
5307, ARM7 TDMI, HCS12/STAR12, TMS320C33, C166/ST10, Renesas M32C/85, Infineon TriCore 1.3
Bound-T Intel-8051, ADSP-21020, ATMEL ERC32, Renesas H8/300, ATMEL AVR and ATmega ARM7
82
ATMEL AVR and ATmega, ARM7RapiTime Motorola PowerPC family, HCS12 family, ARM, NECV850, MIPS3000SWEET ARM9, NECV850EHeptane Pentium1, StrongARM 1110, Renesas H8/300Vienna M68000, M68360, Infineon C167, PowerPC, PentiumFlorida MicroSPARC I, Intel Pentium, StarCore SC100, Atmel Atmega,
PISA/MIPSChalmers PowerPC
Industrial usageStatic/hybrid WCET analysis are today used in real industrial settingsExamples of industrial usage:
Avionics – Airbus, aiTAutomotive – Ford, aiT
83
Automotive Ford, aiTAvionics – BAE Systems, RapiTimeAutomotive – BMW, RapiTimeSpace systems – SSF, Bound-T
However, most companies are still highlyunaware of the concepts of “WCET analysis” and/or “schedulability analysis”
The SWEET approach to
WCET l iWCET analysis
-
Worst-Case Execution Time analysis 2009-12-03
15
The MDH WCET projectResearching on static WCET analysis
Developing the SWEET (SWEdish Execution Time) analysis tool
Research focus:Flow analysisTechnology transfer to industryTechnology transfer to industryInternational collaborationParametrical WCET analysis*Early stage WCET analysis*WCET analysis for multi-core*
Previous research focus:Low-level analysisCalculation
* = new project activities
Project membersProfessorBjörnLisperDocentAndreas
LecturerChristerSandberg
PhD studentM lAndreas
ErmedahlDocentJan Gustafsson
MarceloSantos
PhD studentStefanBygde
86
+ 2 post-docs, 1 PhD student, and 2 programmers
Technology transfer to industry (and academia)
Evaluation of WCET analysis in industrial settings Targeting both WCET tool providers and industrial usersUsing state-of-the-art WCET analysis tools
Applied as MSc thesis works:Enea OSE, using SWEET & aiT Volcano Communications, using aiTBound-T adaption to Lego Mindstorms and Renesas H8/300. Used in MDH RT courses CC-Systems, using aiT & measurement toolsVolvo CE using aiT & SWEET ….
Articles and MSc thesis reports available on the MRTC web
Available MSc thesis work
“Creating open-source embedded real-time systems benchmarks in cooperation with companies”
Work closely with CC-System company and (if time allows) other companies Result of high importance for RT & WCET research communities (and industries)
http://www.idt.mdh.se/examensarbete/index.php?choice=show&id=0904Or email: [email protected]
88
Flow analysisMain focus of the MDH WCET analysis group
Motivated by our industrial case studiesWe perform many types of advanced program analyses:
Program slicing (dependency analysis)x > 5
x = 1..10
x = 1 4
89
Value analysis (abstract interpretation)Abstract execution...
Both loop bounds and infeasible paths are derivedAnalysis made on ALF intermediate code
~ “high level assembler”
A
C
x > 5
B
x < 3
D
EPath A-C is infeasible!
x = 6..10
x = 1..4
x = 1..2x = 3..8
Where SWEET comes in…
C Source Object File
Object File
Compiler
C Source
Compiler
Linker
Object File C Library
C Runtime
OS
Object File
Hardware WCETLow-levelanalysis Calculation
SWEET
C Source Object File
WCETEstimate
Flow analysis
Input valueconstraints
ALFLinker
Executable
Other Lib
LOW-SWEET
ALF
Object File
ExecutableBinary reader
-
Worst-Case Execution Time analysis 2009-12-03
16
Slicing for flow analysisObservation: some variables and statements do not affect the execution flow of the program= they will never be used to determine the outcome of conditions
Idea: remove variables and statements which are guaranteed to not affect execution flow
Subsequent flow analyses should provide same resultSubsequent flow analyses should provide same resultbut with shorter analysis time
Based on well-known program slicing techniquesReduces up to 94% of total program size for some of our benchmarks
1. a[0] = 42;2. i = 1;3. j = 5;4. n = 2 * j;5. while (i
-
Worst-Case Execution Time analysis 2009-12-03
17
Upcoming challengesfor WCETfor WCET analysis
Trends in Embedded SW
Traditionally: wmbedded SW written in C and assembler, close to hardware Trend: size of embedded SW increases
SW now clearly dominates ES development costHardware used to dominate
Trend: more ES development by high-level programming languages and tools
Object-oriented programming languagesModel-based toolsComponent-based tools
Increase in embedded SW sizeMore and more functionality required
Most easily realized in software
Software gets more and more complexHarder to identify the timing critical part of the codeSource code not always available for all parts of the system, e.g. for SW developed by subcontractors
Challenges for WCET analysis:Scaling of WCET analysis methods to larger code sizes
Better visualization of results (where is the time spent?)Better adaptation to the SW development process
Today’s WCET analysis works on the final executableChallenge: how to provide reasonable precise WCET estimates at early development stages
Higher-level prog. languages
Typically object-oriented: C++, Java, C#, …Challenges for WCET analysis:
Higher use of dynamic data structuresIn traditional ES programming all data is statically
ll t d d i il tiallocated during compile time Dynamic code, e.g., calls to virtual methods
Hard to analyze statically (actual method called may not be known until run-time)
Dynamic middleware: Run-time system with GCVirtual machines with JIT compilation
Model-based designMore embedded system code generated by higher-level modeling and design tools
Esterel, Ascet, Targetlink, Scade, ...
The resulting code structure depends on the code generator
model
p gOften simpler than handwritten code
Possible to integrate such toolswith WCET analysis tools
The analysis can be automatedLess user interaction requiredE.g., loop bounds can be provided directly by the modeling tool
...label rerun: if(flag1 || flag2) ...else
goto rerun;...
generatedcode
….´10010101001110101100101001100101010011101011001010011010010101010100101001010010010101010101010100101010....
executable
Component-based designVery trendy within software engineeringGeneral idea:
Package software into reusable componentsBuild systems out of prefabricatedBuild systems out of prefabricated components, which are “glued together”
WCET analysis challenges:How to reuse WCET analysis results when some settings have changed? How to analyze SW components when not all information is available? Are WCET analysis results composable?
-
Worst-Case Execution Time analysis 2009-12-03
18
Compiler interactionToday – commercial WCET analysis tools analyses binariesAnother possibility – interaction with the compiler
Easier to identify data objects and to understand what the program is intended to do
There exists many compilers for embedded systems
Very fragmented market Each specialized on a few particular targetsTargeting code size and execution speed
Integration with WCET analysis tools opens new possibilities:
Compile for timing predictabilityCompile for small WCET
Trends in Embedded HWTrend: Large variety of ES HW platforms
Not just one main processor type as for PCs Many different HW configurations (memories, devices, …) Challenge: How to make WCET analysis portable between platforms?
Trend: Increasingly complex HW features to boost performance
Taken from the high-performance CPUsPipelines, caches, branch predictors, superscalar, out-of-order, …Challenge: How to create safe and tight HW timing models?
Trend: Multi-core architectures
Multi-core architecturesSeveral (simple) CPUs on one chip
Increased performance & lower power “SoC”: System-on-a-Chip
Explicit parallelismNot hidden as in superscalar architectures
Likely that CPUs will be less complex th t hi h d
CPU
L1 cache
CPU
L1 cache
CPU
L1 cache
than current high-end processors Good for WCET analysis!
However, risk for more shared resources: buses, memories, …
Bad for WCET analysis!Unrelated threads on other cores might use shared resources
Multi-core ok if predictable sharing of common resources is enforced
Multicore chip
L2 cache
RAM
Devices
etc.Network
Timer Serial
The End!
106
For more information:www.mrtc.mdh.se/projects/wcet
WCET tool demotool demo