design & co-design of embedded systems

40
Design & Co-design of Embedded Systems Introduction to Co-synthesis Algorithms + HW/SW Partitioning Algorithms Maziar Goudarzi

Upload: jake

Post on 23-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Design & Co-design of Embedded Systems. Introduction to Co-synthesis Algorithms + HW/SW Partitioning Algorithms. Maziar Goudarzi. Today Program. Introduction Preliminaries Hardware/Software Partitioning Distributed System Co-Synthesis (Next session). Reference: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design & Co-design of Embedded Systems

Design & Co-design of Embedded Systems

Introduction to Co-synthesis Algorithms

+ HW/SW Partitioning Algorithms

Maziar Goudarzi

Page 2: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

2

Today Program

IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis (Next

session)

Reference:

Wayne Wolf, “Hardware/Software Co-Synthesis Algorithms,” Chapter 2, Hardware/Software Co-Design: Principles and Practice, Eds: J. Staunstrup, W. Wolf, Kluwer Academic Publishers, 1997.

Page 3: Design & Co-design of Embedded Systems

Introduction to HW/SW Co-Synthesis Algorithms

Introduction

Page 4: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

4

Introduction

Implementing a system? Why use CPU? Easier implementation Easier (and cheaper) to change and debug

Why use hardware modules? Meeting other constraints

performance, power consumption, etc

Found a CPU meeting all non-functional constraints? Yes! What could be better? Use the CPU. No! Design custom logic, or a combination of

both

Page 5: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

5

Introduction (cont’d)

Why more than one CPU or custom logic?

Why not use the fastest available CPU?

Page 6: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

6

Introduction (cont’d)

Reason 1: Exponential cost

per CPU performance

Figure:late-1996 retail

prices of Pentium Processor

050

100150200250300350400

75 120 150

Cost (US $)

Pentium processor prices

Clock speed(MHz)

Page 7: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

7

Introduction (cont’d)

Exponential price/performance implies Paying for performance in a uni-processor is

very expensiveUsing multiple small CPUs is cheaperCommunication overhead is added, but still an

economic choiceProcessors need not be CPUs. But special-function

units.Special-purpose PEs can be even cheaper than

dedicated CPU! • Measured in system manufacturing cost, not

necessarily in design cost

Page 8: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

8

Introduction (cont’d)

Reason 2: Scheduling overhead

More than 31% overhead, under reasonable assumptions, when executing multiple processes

• Reason: uncertainty in the times at which the processes will need to execute

• Result: we have to reserve extra CPU horsepower, which comes at exponential cost

Page 9: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

9

Introduction (cont’d)

Definition HW/SW co-synthesis: process of

simultaneously design the SW architecture of an application and the HW architecture on which that SW is executed.

Page 10: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

10

Introduction (cont’d)

ProblemSpecification

SW(app.)Arch.

HW Engine

PE PE

PE Mem

CommunicationChannels

CoSynthesis

Page 11: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

11

Introduction (cont’d)

Problem specification includes Functionality Non-functional requirements

Performance goals, physical constraints, etc

Page 12: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

12

Introduction (cont’d)

Hardware Architecture One or more Processing-Elements (PEs)

Software (Application) Architecture includes Process structure

Each process executes sequentiallyDetermines

• The amount of parallelism• The amount of communication

Proper process structure is crucial for cost-effective implementation

Allocation of the processes onto PEs in the HW engine

Communication channels Hardware elements Software primitives

Page 13: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

13

Introduction (cont’d)

HW/SW Co-synthesis Allows trade-offs between SW architecture and

HW on which it executes Where is such trade-off important?

Everyday processing applications vs. Embedded applications

Embedded computing: Computing with limited resources

Different co-synthesis styles depending onThe SpecificationThe System ComponentsSystem Elements to synthesize

Page 14: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

14

Introduction (cont’d)

Two broad implementation styles HW/SW partitioning

Target HW architecture: a CPU and multiple ASICs

Distributed System Co-synthesisTarget HW architecture: arbitrary hardware

topologies

Page 15: Design & Co-design of Embedded Systems

Introduction to HW/SW Co-Synthesis Algorithms

Preliminaries

Page 16: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

16

Preliminaries

Rate (execution rate) Maximum frequency at which a processing

must be done

Single-rate vs. Multi-rate Example of multi-rate system

audio/video decoder

Page 17: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

17

Preliminaries (cont’d)

Latency Required maximum time between starting

and finishing a processing task

Page 18: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

18

Behavior Models

DFG: Data Flow Graph Suitable for data-processing algorithms

CFG: Control Flow Graph Suitable for process control algorithms

CDFG: Control Data Flow Graph Combination of the two above

Page 19: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

19

Behavior Models (cont’d)

Single-rate systems Standard model: Control-Data Flow Graph

(CDFG)Implies a program-counter or system-stateNot suitable to model multi-rate tasks

• Due to unified system state

Page 20: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

20

Behavior Models (cont’d)

Multi-rate systems Common model: Task

Graph

Task Graph Each Node: Process Each Edge:

Communication Each Set of connected

nodes: sub-task

P1

P2 P3

P4 P5

P6

Page 21: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

21

Behavior Models (cont’d)

SDFG: Synchronous Data Flow Graph Suitable for signal processing

applications = DFG + may be cyclic Lee and Messerschmitt:

Algorithm to check feasibilityof an SDFG + schedule it ona uni-processor or multiprocessor

a b

c

21

1

12

1

Page 22: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

22

Behavior Models (cont’d)

Co-design Finite-State Machine (CFSM) POLIS project at UC-Berkeley Used for control-dominated systems

e.g., ECU (Engine Control Unit) Event-driven FSM

Transitions occur by events (instead of periodic clock signal)

idle test

error

Done/stop_time

Timeout/alarm=ON

Reset/ alarm=OFF

Go /start_timer

Page 23: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

23

Architectural Models

The hardware engine also needs a description

Here, only basic models for cost estimation

Page 24: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

24

Architectural Models (cont’d)

HW-engine is another graph Generally:

Processing Elements (PE) as nodes + communication channels as edges

Problem: How to model busses?Solution:

• Nodes also used for channels • Edges represents nets connecting PEs and

channels• Nodes are labeled with their type

Page 25: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

25

Architectural Models (cont’d)

Component Technology Library Used when pre-designed components

constitute the HW engine Includes

General parameters• e.g., manufacturing cost, average power

consumption, clock rateInformation regarding functional elements

(behaviors)• A table giving execution time of each behavior on

that PE

Page 26: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

26

Architectural Models (cont’d)

CPU scheduling Process vs. thread (light-weight process)

We use these terms interchangeably Scheduling policies to run multiple

processes on a single CPUNon-preemptive vs. preemptive (prioritized)Time-slicing not normally used in embedded

systems

Page 27: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

27

Architectural Models (cont’d)

Scheduling policies (cont’d)Priority can be static or dynamic

• A well-known static priority scheme:– RMS (Rate monotonic Scheduling)– Best static schedule– Guarantees all deadlines– Needs 31% extra CPU horsepower

• A well-known dynamic priority scheme: – EDF (Earliest Deadline First)– 100% CPU utilization– May miss deadlines

More on this later

Page 28: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

28

Topics

IntroductionPreliminariesHardware/Software PartitioningDistributed System Co-Synthesis

Page 29: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

29

Topics

IntroductionA ClassificationExamples

Vulcan Cosyma

Page 30: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

30

Introduction to HW/SW Partitioning

The first variety of co-synthesis applications

Definition A HW/SW partitioning algorithm implements a

specification on some sort of multiprocessor architecture

Usually Multiprocessor architecture = one CPU +

some ASICs on CPU bus

Page 31: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

31

Introduction to HW/SW Partitioning (cont’d)

A Terminology Allocation

Synthesis methods which design the multiprocessor topology along with the PEs and SW architecture

SchedulingThe process of assigning PE (CPU and/or ASICs)

time to processes to get executed

Page 32: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

32

Introduction to HW/SW Partitioning (cont’d)

In most partitioning algorithms Type of CPU is fixed and given ASICs must be synthesized

What function to implement on each ASIC?What characteristics should the implementation

have?

Are single-rate synthesis problemsCDFG is the starting model

Page 33: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

33

HW/SW Partitioning (cont’d)

Normal use of architectural components CPU performs less computationally-intensive

functions ASICs used to accelerate core functions

Where to use? High-performance applications

No CPU is fast enough for the operations

Low-cost applicationASIC accelerators allow use of much smaller,

cheaper CPU

Page 34: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

34

A Classification

Criterion: Optimization StrategyTrade-off between Performance and Cost

Primal ApproachPerformance is the primary goalFirst, all functionality in ASICs. Progressively move

more to CPU to reduce cost.

Dual ApproachCost is the primary goalFirst, all functions in the CPU. Move operations to

the ASIC to meet the performance goal.

Page 35: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

35

A Classification (cont’d)

Classification due to optimization strategy (cont’d) Example co-synthesis systems

Vulcan (Stanford): Primal strategyCosyma (Braunschweig, Germany): Dual strategy

Page 36: Design & Co-design of Embedded Systems

Co-Synthesis Algorithms:HW/SW Partitioning

HW/SW Partitioning Examples:Vulcan

Page 37: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

37

Partitioning Examples:Vulcan

Gupta, De Micheli, Stanford UniversityPrimal approach

1. All-HW initial implementation. 2. Iteratively move functionality to CPU to

reduce cost.

System specification language HardwareC

Is compiled into a flow graph

Page 38: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

38

Partitioning Examples:Vulcan (cont’d)

nop

x=a y=b

1 1x=a; y=b;

HardwareC

cond

x=e y=f

c>d c<=dif (c>d)x=e;

else y=f;

HardwareC

Page 39: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

39

Partitioning Examples:Vulcan (cont’d)

Flow Graph Definition A variation of a (single-rate) task graph Nodes

Represent operationsTypically low-level operations: mult, add

EdgesRepresent data dependenciesEach contains a Boolean condition under which the

edge is traversed

Page 40: Design & Co-design of Embedded Systems

Fall 2005 Design & Co-design of Embedded Systems

40

Partitioning Examples:Vulcan (cont’d)

Flow Graph is executed repeatedly at some rate can have initiation-time constraints for each

nodet(vi)+lij t(vj) t(vi)+uij

can have rate constraints on each nodemi Ri Mi