Download - Design & Co-design of Embedded Systems
Design & Co-design of Embedded Systems
Introduction to Co-synthesis Algorithms
+ HW/SW Partitioning Algorithms
Maziar Goudarzi
Fall 2005 Design & Co-design of Embedded Systems
2
Today Program
IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis (Next
session)
Reference:
Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.
Introduction to HW/SW Co-Synthesis Algorithms
Introduction
Fall 2005 Design & Co-design of Embedded Systems
4
Introduction
Implementing a system? Why use CPU? Easier implementation Easier (and cheaper) to change and debug
Why use hardware modules? Meeting other constraints
performance, power consumption, etc
Found a CPU meeting all non-functional constraints? Yes! What could be better? Use the CPU. No! Design custom logic, or a combination of
both
Fall 2005 Design & Co-design of Embedded Systems
5
Introduction (cont’d)
Why more than one CPU or custom logic?
Why not use the fastest available CPU?
Fall 2005 Design & Co-design of Embedded Systems
6
Introduction (cont’d)
Reason 1: Exponential cost
per CPU performance
Figure:late-1996 retail
prices of Pentium Processor
050
100150200250300350400
75 120 150
Cost (US $)
Pentium processor prices
Clock speed(MHz)
Fall 2005 Design & Co-design of Embedded Systems
7
Introduction (cont’d)
Exponential price/performance implies Paying for performance in a uni-processor is
very expensiveUsing multiple small CPUs is cheaperCommunication overhead is added, but still an
economic choiceProcessors need not be CPUs. But special-function
units.Special-purpose PEs can be even cheaper than
dedicated CPU! • Measured in system manufacturing cost, not
necessarily in design cost
Fall 2005 Design & Co-design of Embedded Systems
8
Introduction (cont’d)
Reason 2: Scheduling overhead
More than 31% overhead, under reasonable assumptions, when executing multiple processes
• Reason: uncertainty in the times at which the processes will need to execute
• Result: we have to reserve extra CPU horsepower, which comes at exponential cost
Fall 2005 Design & Co-design of Embedded Systems
9
Introduction (cont’d)
Definition HW/SW co-synthesis: process of
simultaneously design the SW architecture of an application and the HW architecture on which that SW is executed.
Fall 2005 Design & Co-design of Embedded Systems
10
Introduction (cont’d)
ProblemSpecification
SW(app.)Arch.
HW Engine
PE PE
PE Mem
CommunicationChannels
CoSynthesis
Fall 2005 Design & Co-design of Embedded Systems
11
Introduction (cont’d)
Problem specification includes Functionality Non-functional requirements
Performance goals, physical constraints, etc
Fall 2005 Design & Co-design of Embedded Systems
12
Introduction (cont’d)
Hardware Architecture One or more Processing-Elements (PEs)
Software (Application) Architecture includes Process structure
Each process executes sequentiallyDetermines
• The amount of parallelism• The amount of communication
Proper process structure is crucial for cost-effective implementation
Allocation of the processes onto PEs in the HW engine
Communication channels Hardware elements Software primitives
Fall 2005 Design & Co-design of Embedded Systems
13
Introduction (cont’d)
HW/SW Co-synthesis Allows trade-offs between SW architecture and
HW on which it executes Where is such trade-off important?
Everyday processing applications vs. Embedded applications
Embedded computing: Computing with limited resources
Different co-synthesis styles depending onThe SpecificationThe System ComponentsSystem Elements to synthesize
Fall 2005 Design & Co-design of Embedded Systems
14
Introduction (cont’d)
Two broad implementation styles HW/SW partitioning
Target HW architecture: a CPU and multiple ASICs
Distributed System Co-synthesisTarget HW architecture: arbitrary hardware
topologies
Introduction to HW/SW Co-Synthesis Algorithms
Preliminaries
Fall 2005 Design & Co-design of Embedded Systems
16
Preliminaries
Rate (execution rate) Maximum frequency at which a processing
must be done
Single-rate vs. Multi-rate Example of multi-rate system
audio/video decoder
Fall 2005 Design & Co-design of Embedded Systems
17
Preliminaries (cont’d)
Latency Required maximum time between starting
and finishing a processing task
Fall 2005 Design & Co-design of Embedded Systems
18
Behavior Models
DFG: Data Flow Graph Suitable for data-processing algorithms
CFG: Control Flow Graph Suitable for process control algorithms
CDFG: Control Data Flow Graph Combination of the two above
Fall 2005 Design & Co-design of Embedded Systems
19
Behavior Models (cont’d)
Single-rate systems Standard model: Control-Data Flow Graph
(CDFG)Implies a program-counter or system-stateNot suitable to model multi-rate tasks
• Due to unified system state
Fall 2005 Design & Co-design of Embedded Systems
20
Behavior Models (cont’d)
Multi-rate systems Common model: Task
Graph
Task Graph Each Node: Process Each Edge:
Communication Each Set of connected
nodes: sub-task
P1
P2 P3
P4 P5
P6
Fall 2005 Design & Co-design of Embedded Systems
21
Behavior Models (cont’d)
SDFG: Synchronous Data Flow Graph Suitable for signal processing
applications = DFG + may be cyclic Lee and Messerschmitt:
Algorithm to check feasibilityof an SDFG + schedule it ona uni-processor or multiprocessor
a b
c
21
1
12
1
Fall 2005 Design & Co-design of Embedded Systems
22
Behavior Models (cont’d)
Co-design Finite-State Machine (CFSM) POLIS project at UC-Berkeley Used for control-dominated systems
e.g., ECU (Engine Control Unit) Event-driven FSM
Transitions occur by events (instead of periodic clock signal)
idle test
error
Done/stop_time
Timeout/alarm=ON
Reset/ alarm=OFF
Go /start_timer
Fall 2005 Design & Co-design of Embedded Systems
23
Architectural Models
The hardware engine also needs a description
Here, only basic models for cost estimation
Fall 2005 Design & Co-design of Embedded Systems
24
Architectural Models (cont’d)
HW-engine is another graph Generally:
Processing Elements (PE) as nodes + communication channels as edges
Problem: How to model busses?Solution:
• Nodes also used for channels • Edges represents nets connecting PEs and
channels• Nodes are labeled with their type
Fall 2005 Design & Co-design of Embedded Systems
25
Architectural Models (cont’d)
Component Technology Library Used when pre-designed components
constitute the HW engine Includes
General parameters• e.g., manufacturing cost, average power
consumption, clock rateInformation regarding functional elements
(behaviors)• A table giving execution time of each behavior on
that PE
Fall 2005 Design & Co-design of Embedded Systems
26
Architectural Models (cont’d)
CPU scheduling Process vs. thread (light-weight process)
We use these terms interchangeably Scheduling policies to run multiple
processes on a single CPUNon-preemptive vs. preemptive (prioritized)Time-slicing not normally used in embedded
systems
Fall 2005 Design & Co-design of Embedded Systems
27
Architectural Models (cont’d)
Scheduling policies (cont’d)Priority can be static or dynamic
• A well-known static priority scheme:– RMS (Rate monotonic Scheduling)– Best static schedule– Guarantees all deadlines– Needs 31% extra CPU horsepower
• A well-known dynamic priority scheme: – EDF (Earliest Deadline First)– 100% CPU utilization– May miss deadlines
Fall 2005 Design & Co-design of Embedded Systems
28
Topics
IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis
Fall 2005 Design & Co-design of Embedded Systems
29
Topics
IntroductionA ClassificationExamples
Vulcan Cosyma
Fall 2005 Design & Co-design of Embedded Systems
30
Introduction to HW/SW Partitioning
The first variety of co-synthesis applications
Definition A HW/SW partitioning algorithm implements a
specification on some sort of multiprocessor architecture
Usually Multiprocessor architecture = one CPU +
some ASICs on CPU bus
Fall 2005 Design & Co-design of Embedded Systems
31
Introduction to HW/SW Partitioning (cont’d)
A Terminology Allocation
Synthesis methods which design the multiprocessor topology along with the PEs and SW architecture
SchedulingThe process of assigning PE (CPU and/or ASICs)
time to processes to get executed
Fall 2005 Design & Co-design of Embedded Systems
32
Introduction to HW/SW Partitioning (cont’d)
In most partitioning algorithms Type of CPU is fixed and given ASICs must be synthesized
What function to implement on each ASIC?What characteristics should the implementation
have?
Are single-rate synthesis problemsCDFG is the starting model
Fall 2005 Design & Co-design of Embedded Systems
33
HW/SW Partitioning (cont’d)
Normal use of architectural components CPU performs less computationally-intensive
functions ASICs used to accelerate core functions
Where to use? High-performance applications
No CPU is fast enough for the operations
Low-cost applicationASIC accelerators allow use of much smaller,
cheaper CPU
Fall 2005 Design & Co-design of Embedded Systems
34
A Classification
Criterion: Optimization StrategyTrade-off between Performance and Cost
Primal ApproachPerformance is the primary goalFirst, all functionality in ASICs. Progressively move
more to CPU to reduce cost.
Dual ApproachCost is the primary goalFirst, all functions in the CPU. Move operations to
the ASIC to meet the performance goal.
Fall 2005 Design & Co-design of Embedded Systems
35
A Classification (cont’d)
Classification due to optimization strategy (cont’d) Example co-synthesis systems
Vulcan (Stanford): Primal strategyCosyma (Braunschweig, Germany): Dual strategy
Co-Synthesis Algorithms:HW/SW Partitioning
HW/SW Partitioning Examples:Vulcan
Fall 2005 Design & Co-design of Embedded Systems
37
Partitioning Examples:Vulcan
Gupta, De Micheli, Stanford UniversityPrimal approach
1. All-HW initial implementation. 2. Iteratively move functionality to CPU to
reduce cost.
System specification language HardwareC
Is compiled into a flow graph
Fall 2005 Design & Co-design of Embedded Systems
38
Partitioning Examples:Vulcan (cont’d)
nop
x=a y=b
1 1x=a; y=b;
HardwareC
cond
x=e y=f
c>d c<=dif (c>d)x=e;
else y=f;
HardwareC
Fall 2005 Design & Co-design of Embedded Systems
39
Partitioning Examples:Vulcan (cont’d)
Flow Graph Definition A variation of a (single-rate) task graph Nodes
Represent operationsTypically low-level operations: mult, add
EdgesRepresent data dependenciesEach contains a Boolean condition under which the
edge is traversed
Fall 2005 Design & Co-design of Embedded Systems
40
Partitioning Examples:Vulcan (cont’d)
Flow Graph is executed repeatedly at some rate can have initiation-time constraints for each
nodet(vi)+lij t(vj) t(vi)+uij
can have rate constraints on each nodemi Ri Mi