modeling, synthesis, verification - cecs · 2009. 7. 8. · modeling, synthesis, verification...

30
Embedded System Design Embedded System Design Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner 7/8/2009 Chapter 6: Hardware Synthesis 7/8/2009 2 Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner Chapter 6: Hardware Synthesis Hardware Synthesis Design flow RTL architecture Input specification Specification profiling RTL synthesis Variable merging (Storage sharing) Operation Merging (FU sharing) Connection Merging (Bus sharing) Chaining and multi-cycling Data and control pipelining Scheduling Component interfacing Conclusions

Upload: others

Post on 22-Dec-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

Embedded System DesignEmbedded System Design Modeling, Synthesis, Verification

Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner

7/8/2009

Chapter 6: Hardware Synthesis

7/8/2009 2Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

• Design flow• RTL architecture• Input specification• Specification profiling• RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 2: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 3Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

HW Synthesis Design Flow

Tool Model

RTLComponent

Library

Specification

RTL Model

Model Generation

RTL Tools

Compilation

Estimation

HLS

Allocation Binding Scheduling

...

• Compilation

• Estimation

• HLS

• Model generation

• RTL synthesis

• Logic synthesis

• Layout

7/8/2009 4Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow• RTL architecture• Input specification• Specification profiling• RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 3: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 5Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

RTL Architecture

•Controller•FSM controller•Programmable controller

•Datapath components•Storage components•Functional units•Connection components

•Pipelining•Functional unit •Datapath•Control

•Structure•Chaining•Multicycling•Forwarding•Branch prediction•Caching

7/8/2009 6Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

RTL Architecture with FSM Controller

Output Logic

B1B2

ALU Memory

RF

MUL

B3FSM Controller

Input Logic

Datapath

ControlInputs

ControlOutputs

ControlSignals

StatusSignals

DataInputs

DataOutputs

•Simple architecture

•Small number of states

Page 4: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 7Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Status

Address

IR or C

WR

Cmemor

PMem

AG

SR

PC

B1B2

ALU Memory

RF

MUL

B3Programmable ControllerDatapath

Offset

ControlInputs

ControlOutputs

ControlSignals

DataInputs

DataInputs

RTL Architecture with Programmable Controller

•Complex architecture•Control and datapath pipelining•Advanced structural features

•Large number of states (CW or IS)

7/8/2009 8Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture• Input specification• Specification profiling• RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 5: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 9Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Input Specification

• Programming language (C/C++, …)• Programming semantics requires pre-synthesis optimization

• System description language (SystemC, …)• Simulation semantics requires pre-synthesis optimization

• Control/Data flow graph (CDFG)• CDFG generation requires dependence analysis

• Finite state machine with data (FSMD)• State interpretation requires some kind of scheduling

• RTL netlist• RTL design that requires only input and output logic synthesis

• Hardware description language (Verilog / VHDL)• HDL description requires RTL library and logic synthesis

7/8/2009 10Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

C Code for Ones Counter

Function-based C code RTL-based C code

01: int OnesCounter(int Data){02: int Ocount = 0;03: int Temp, Mask = 1;04: while (Data > 0) {05: Temp = Data & Mask;06 Ocount = Data + Temp;07: Data >>= 1;08: }09: return Ocount;10: }

01: while(1) {02: while (Start == 0);03: Done = 0;04: Data = Input;05: Ocount = 0;06: Mask = 1; 07: while (Data>0) {08: Temp = Data & Mask;09: Ocount = Ocount + Temp;10: Data >>= 1;11: }12: Output = Ocount;13: Done = 1;14: }

•Programming language semantics

• Sequential execution,

• Coding style to minimize coding

•HW design

• Parallel execution,

• Communication through signals

Page 6: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 11Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

CDFG for Ones Counter

0

1

>0

0

DoneOutput

Input

0

Done

0

Ocount

1

MaskData

&

>>1 +

DoneData

DoneOcountData

Start

Data

1

Mask Ocount

•Control/Data flow graph

•Resembles programming language

•Loops, ifs, basic blocks (BBs)

•Explicit dependencies

•Control dependences between BBs

•Data dependences inside BBs

•Missing dependencies between BBs

7/8/2009 12Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

FSMD for Ones Counter

•FSMD more detailed then CDFG

•States may represent clock cycles

•Conditionals and statements executed concurrently

• All statement in each state executed concurrently

•Control signal and variable assignments executed concurrently

•FSMD includes scheduling

•FSMD doesn't specify binding or connectivity

Page 7: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 13Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

CDFG and FSMD for Ones Counter

7/8/2009 14Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

RTL Specification for Ones Counter

S0

S7

S4

S6

S5

S4

S3

S2

S1

S0

StateNext

1XXS7

01XS6

X

X

X

X

X

X

1

0

StartInputs: Output:Present

0XS5

00S6

0XS4

0XS3

0XS2

0XS1

XXS0

XXS0

DoneData = 0State

RF[0] = Data, RF[1] = Mask, RF[2] = Ocount, RF[3] = Temp

RF[2]

RF[0]

RF[2]

RF[0]

RF[2]

RF[2]

X

X

RF Read Port A

X

pass

add

AND

increment

subtract

X

X

ALU

X

X

RF[3]

RF[1]

X

RF[2]

X

X

RF Read Port B

X

B3

B3

B3

B3

B3

Inport

X

RF selector

X

shift right

pass

pass

pass

pass

X

X

Shifter

disable

RF[0]

RF[2]

RF[3]

RF[1]

RF[2]

RF[0]

X

RF Write

enableS7

ZS6

ZS5

ZS4

ZS3

ZS2

ZS1

ZS0

OutportState

Output logic table

status

Output Logic

B1

ALU

Shifter

B3FSM Controller

Input Logic

Datapath

Done

Start

B2

Selector

Outport

ControlSignals

Inport

RF

Input logic table

•RTL Specification

•Controller and datapath netlist

•Input and output tables for logic synthesis

•RTL library needed for netlist

Page 8: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 15Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

HDL description of Ones Counter

01: // …02: always@(posedge clk) 03: begin : output_logic04: case (state)05: // …06: S4: begin 07: B1 = RF[0];08: B2 = RF[1]; 09: B3 = alu(B1, B2, l_and);10: RF[3] = B3;11: next_state = S5;12: end13: // …14: S7: begin 15: B1 = RF[2];16: Outport <= B1;17: done <= 1; 18: next_state = S0;19: end20: endcase21: end 22: endmodule

•HDL description

•Same as RTL description

•Several levels of abstraction

•Variable binding to storage

•Operation binding to FUs

•Transfer binding to connections

•Netlist must be synthesized

•Partial HLS may be needed

7/8/2009 16Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification• Specification profiling• RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 9: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 17Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Profiling and Estimation

• Pre-synthesis optimization

• Preliminary scheduling• Simple scheduling algorithm

• Profiling• Operation usage

• Variable life-times

• Connection usage

• Estimation• Performance

• Cost

• Power

7/8/2009 18Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Square-Root Algorithm (SRA)

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

S7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

• SQR = max ((0.875x + 0.5y), x)

• x = max (|a|, |b|)

• y= min (|a|, |b|)

Page 10: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 19Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Variable and Operation Usage

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

S7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Variable usage

1233222No. of live variables

Xt7

Xt6

Xt5

X

X

S1

X

X

S3

XX

S2

X

X

S5

X

X

X

S4

X

S6

t4

t3

y

x

t2t1

b

a

S7

S7

111212No. of

operations

2

S1

2

S3

11

S2

1

S5

1

S4

1

S6

1+1-2>>1max1min2abs

Max. no. of units

Operation usage

7/8/2009 20Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Connectivity usage

I

O

t5

O

I

t6

O

t7

I

O

t3

I

O

t4

I

a

I

I

O

t1

I

b

I

I

I

O

x

I

I

O

t2

I

O

y

+

-

>>1

>>3

max

min

abs2

abs1

S7

111212No. of

operations

2

S1

2

S3

1

1

S2

1

S5

1

S4

1

S6

1+

1-

2>>

1max

1min

2abs

Max. no. of units

Operation usage

Connectivity usage

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

S7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Page 11: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 21Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification Specification profiling• RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

7/8/2009 22Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath Synthesis

• Variable Merging (Storage Sharing)

• Operation Merging (FU Sharing)

• Connection Merging (Bus Sharing)

• Register merging (RF sharing)

• Chaining and Multi-Cycling

• Data and Control Pipelining

Page 12: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 23Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Gain in register sharing

a

Selector Selector

Selector Selector

c b d

x y

+

Selector Selector

a , c b , d

x , y

+

Selector

Partial FSMD Datapath without register sharing Datapath with register sharing

•Register sharing

•Grouping variables with non-overlapping lifetimes

•Sharing reduces connectivity cost

7/8/2009 24Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

General partitioning algorithm

Stop

no yes

Create compatibility graph

Merge highest priority nodes

Upgrade compatibility graph

All nodes incompatible

Start•Compatibility graph

•Compatibility:

•Non-overlapping in time

•Not using the same resource

•Non-compatible:

•Overlapping in time

•Using the same resource

• Priority

•Critical path

•Same source, same destination

Page 13: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 25Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Variable Merging for SRA

1/0 0/1

1/0

0/1

R1 = [ a, t1, x, t7 ]R2 = [ b, t2, y, t3, t5, t6 ]R3 = [ t4 ]

(a) Initial compatibility graph (b) Compatibility graph after merging t3, t5, and t6

(c) Compatibility graph after merging t1, x, and t7 (d) Compatibility graph after merging t2 and y

(e) Final compatibility graph (f) Final register assignments

7/8/2009 26Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath with Shared Registers

Selector Selector

R1 R2 R3

| a | | b | min max + - >>1 >>3

•Variables combined into registers

•One functional unit for each operation

Page 14: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 27Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Gain in Functional Unit Sharing

x = a + b

y = c + d

Sj

Si

a

Selector Selector

c b d

x y

+/-

Partial FSMD Non-shared design Shared design

•Functional unit sharing

•Smaller number of FUs

•Larger connectivity cost

7/8/2009 28Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Operation Merging for SRA

1/0

|a| |b|

min max

+ -

1/0

1/01/0

1/1 1/1

0/0 1/1

1/1

1/1

1/1 2/0

2/1

|a| |b|

min max

+ -

1/0

1/1 1/1

1/1

2/1 2/0

1/1

1/0

Initial compatibility graph Compatibility graph after merging of + and -

Compatibility graph after merging of min, +, and - Final graph partitions

Page 15: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 29Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Selector Selector

R1 R2 R3

abs/max>>1Selector

abs/min/+/-

>>3

Datapath with Shared Registers and FUs

•Variables combined into registers

•Operations combined into functional units

7/8/2009 30Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Connection usage for SRA

XN

XM

XL

XK

XXXXJ

XXXI

X

X

S6

X

S7S0

X

X

X

X

S2

X

X

S1

X

X

S4

X

X

S3

X

X

S5

H

G

F

E

D

C

B

A

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

S7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Connection usage table

•Find compatible connections for merging into buses

Page 16: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 31Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Connection Merging for SRA

I

J

K

L

M

N

Compatibility graph for input buses Compatibility graph for output buses

Bus assignmentXN

XM

XL

XK

XXXXJ

XXXI

X

X

S6

X

S7S0

X

X

X

X

S2

X

X

S1

X

X

S4

X

X

S3

X

X

S5

H

G

F

E

D

C

B

A

•Combine connection not used at the same time

•Priority to same source, same destination

•Priority to maximum groups

7/8/2009 32Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath with Shared Registers, FUs and Buses

•Minimal SRA architecture

•3 registers

•4 (2) functional units

• 4 buses

Page 17: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 33Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Register Merging into RFs

R1 R2

R3

S6 S7S0 S2S1 S4S3 S5

R3

R2

R1

Register assignment

Register access tableCompatibility graph

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

S7

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )y = min ( t1 , t2 )

t3 = x >> 3t4 = y >>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

•Register merging: Port sharing

•Merge registers with non-overlapping access times

•No of ports is equal to simultaneous read/write accesses

7/8/2009 34Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath with Shared RF

H

Out

In1 In2

R1

R2

>>3 >>1

Bus1

abs/max abs/min/+/-

Bus2

Bus3

Bus4

R3

•RF minimize connectivity cost by sharing ports

Page 18: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 35Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification Specification profiling RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

• Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

7/8/2009 36Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath with Chaining

In 2

S0

a = In1b = In2

0

1

Start = 1

S1

S2

S3

S4

S5

S6

t1 = |a|t2 = |b|

t5 = x – t3

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3t4 = min ( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

•Chaining connects two or more FUs

•Allows execution of two or more operation in a single clock cycle

•Improves performance at no cost

Page 19: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 37Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

In 1

R1 R2 R3

Bus 1

abs/+/-

Bus 2

Bus 3

Bus 4

In 2

Out

abs/max min

>>3 >>1

In 2

S0

a = In1b = In2

0

1

Start

S1

S2

S3

S4

S5

S6

t1 = |a|t2 = |b|

t5 = x – t3t4 = [min ( t1, t2 ) >>1]

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3[t4]= min ( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Done = 1Out = t7

Datapath with Chained and Multi-Cycled FUs

•Multi-cycling allows use of slower FUs

•Multi-cycling allows faster clock-cycle

7/8/2009 38Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification Specification profiling RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

Chaining and multi-cycling• Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 20: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 39Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Pipelining

• Functional Unit pipelining• Two or more operation executing at the same time

• Datapath pipelining • Two or more register transfers executing at the same time

• Control Pipelining• Two or more instructions generaqted at the same time

7/8/2009 40Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Functional Unit Pipelining (1)

In 2Done = 1Out = t7

t1 = |a|

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3

t4 = min( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Start

a = In1b = In2

S0

0

1S1

S2

S4

S6

S7

S8

t5 = x – t3

S5

t2 = |b|

S3

•Operation delay cut in ”half”

•Shorter clock cycle

•Dependencies may delay some states

•Extra NO states reduce performance gain

Page 21: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 41Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Functional Unit Pipelining (2)

In 2Done = 1Out = t7

t1 = |a|

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3

t4 = min( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Start

a = In1b = In2

S0

0

1S1

S2

S4

S6

S7

S8

t5 = x – t3

S5

t2 = |b|

S3

t7

max

NO

t7

t7

S8

t6

+

NO

max

t6

X

S7

t5

>>1

-

NO

+

t4

t5

S6

Write Out

t4Write R3

>>3

min

-

t3

X

S5

b

a

S0

t1

|a|

|b|

b

S2

|a|

a

S1

max

t2

t1

S3

t2

|b|

NO

t3

X

max

min

t2

t1

S4

Write R2

Write R1

Shifters

ALU stage 2

ALU stage 1

Read R3

Read R2

Read R1

Timing diagram with 4 additional NO states

7/8/2009 42Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath Pipelining (1)

In1

R1 R2 R3

>>1

Bus1

Bus2

Bus3

Bus4

>>3

In2

Out

ALU

•Register-to-register delay cut in “equal” parts•Much shorter clock cycle•Dependencies may delay some states•Extra NO states reduce performance gain

Page 22: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 43Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath pipelining (2)

In 2Done = 1Out = t7

t1 = |a|

x = max( t1 , t2 )t3 = max( t1 , t2 )>>3

t4 = min( t1 , t2 )>>1

t6 = t4 + t5

t7 = max ( t6 , x )

Start

a = In1b = In2

S0

0

1S1

S2

S4

S6

S7

S8

t5 = x – t3

S5

t2 = |b|

S3 Timing diagram with additional NO clock cycles

In1

R1 R2 R3

>>1

Bus1

Bus2

Bus3

Bus4

>>3

In2

Out

ALU

t6t5t3t2t2bALUIn(R)

t7

17

t7

t7

18

x

t6

x

15

max

16

t1

|b|

4

t2

5

t6

14

t4

t4

t5

12

+

13

-

10

t5

11

Write Out

t4Write R3

>>1

x

t3

x

9

b

a

1

|a|

b

3

a

a

2

max

t1

t2

t1

7

t1

t2

t1

6

t3

x

>>3

min

8

Write R2

Write R1

Shifters

ALUOut

ALUIn(L)

Read R3

Read R2

Read R1

Cycles

7/8/2009 44Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Datapath and Control Pipelining (1)

S1

1 0a>b

x = c + d

y = x - 1

S2

S3

ALU

Register

Selector

RF Mem

Bus2

Bus1

StatusSignals

Controlsignals

/

Register

DataInputs

Datapath

DataOutputs

CW

R

PC CMem

AG

SR

Offset

Controller

ControlInputs

ControlOutputs

Bus3

•Fetch delay cut into several parts•Shorter clock cycle•Conditionals may delay some states•Extra NO states reduce performance gain

Page 23: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 45Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Data and Control Pipelining (2)

S1

1 0a>b

x = c + d

y = x - 1

S2

S3

ALU

Register

Selector

RF Mem

Bus2

Bus1

StatusSignals

Controlsignals

/

Register

DataInputs

Datapath

DataOutputs

CW

R

PC CMem

AG

SR

Offset

Controller

ControlInputs

ControlOutputs

Bus3

14/17

NO

13

3

15

NO

14

4

20

x-1

19

9

y

20

10

19181716131211Write PC

a>bWrite SR

1

x

1

x

S3

18

8

10

0

NO

12

2

b

a

b

a

S1

11

1

c+d

NO

16

6

d

c

d

c

S2

15

5

x

NO

17

7

Write RF

Write ALUOut

Write ALUIn(R)

Write ALUIn(L)

Read RF(R)

Read RF(L)

Read CWR

Read PC

CycleOperation

Timing diagram with additional NO clock cycles

•3 NO cycles for the branch

•2 NO cycles for data dependence

7/8/2009 46Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification Specification profiling RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

Chaining and multi-cycling Data and control pipelining• Scheduling• Component interfacing• Conclusions

Page 24: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 47Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Scheduling

• Scheduling assigns clock cycles to register transfers

• Non-constrained scheduling• ASAP scheduling

• ALAP scheduling

• Constrained scheduling• Resource constrained (RC) scheduling

– Given resources , minimize metrics (time, power, …)

• Time constrained (TC) scheduling– Given time, minimize resources (FUs, storage, connections)

7/8/2009 48Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

C and CDFG for SRA Algorithm

C flowchart

0Start

t1=|a|t2=|b|

x=max(t1,t2)y=min(t1,t2)

t3=x>>3t4=y>>1t5=x-t3

t6=t4+t5t7=max(t6,x)

Done=1Out=t7

a=In1b=In2

1

0

1

Start

In1 In 2

a b

a b

min

|a| |b|

max

>>1 >>3

-

+

max

Out Done

1

CDFG

Page 25: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 49Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

RC Scheduling

ASAPschedule

Ready list with mobilities(ALAP – ASAP)

ALAPschedule

RC schedule(for single FUand 2 shifters)

7/8/2009 50Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Scheduling Algorithms

RC algorithm TC algorithm

Page 26: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 51Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

TC Scheduling

ASAP ALAP TC schedule

Out

min

|a| |b|

max

>>1 >>3

-

+

max

Out

min

|a| |b|

max

>>1

>>3

+

max

min

|a|

|b|

max

>>1

>>3

-

+

max

S5

S6

S7

S1

S3

S4

S8

S2

a b a b a b

Out

-

7/8/2009 52Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Initial probability distribution graph Graph after max, +, and – were scheduled

1.83

S5

S6

S7

S1

S2

S3

S4

|a|

|b|

max

min

-

+

max

>>

1

>>

3

1.0

.83

.83

1.0

1.0

.5

.83

.83

.33

AU units

Probability sum/state

Shift units

1.33

S5

S6

S7

S1

S2

S3

S4

|a| |b|

max

min

-

max

>>

1

>>3

1.0

1.33

.33

1.0

1.0

1.0

AU units

Probability sum/state

Shift units

+

Distribution Graphs for TC scheduling

Page 27: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 53Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

1.0

S5

S6

S7

S1

S2

S3

S4

|a| |b|

max

min

-

max

>>1

>>3

1.0

1.0

1.0

1.0

1.0

AU units

Probability sum/state

Shift units

+

1.0

1.0

1.0

1.0

S5

S6

S7

S1

S2

S3

S4

|a|

|b|

max

min

-

max

>>1

>>3

1.0

1.0

1.0

1.0

1.0

AU units

Probability sum/state

Shift units

+

1.0

1.0

1.0

Graph after max, +, -, min, >>3, and >>1were scheduled

Distribution graph for final schedule

Distribution Graphs for TC scheduling

7/8/2009 54Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Hardware Synthesis

Design flow RTL architecture Input specification Specification profiling RTL synthesis

• Variable merging (Storage sharing)• Operation Merging (FU sharing)• Connection Merging (Bus sharing)

Chaining and multi-cycling Data and control pipelining Scheduling• Component interfacing• Conclusions

Page 28: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 55Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Interface Synthesis

•Combine process and channel codes

•HW and protocol clock cycles may differ

•Insert a bus-interface component

•Communication in three parts:

•Freely schedulable code

•Scheduled with process code

•Schedule constrained code

•MAC driver from library for selected bus interface

•Bus interface

•Implemented by bus interface component from library

7/8/2009 56Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Bus Interface Controller (1)

ALU

Reg RF Mem

Bus 2

Bus 1

Statussignals

Controlsignals

/

CMem

AG

offset

Controller

Control

ack

Address

Write Queue

Read Queue

Datapath

Output logic

Input logic

Bus 3 Bus 4

Selector

Selector

DATA

ADDRESS

CONTROL

GRANT

REQUEST

Controlsignals

INC

ready OutCntrl OutAddr OutData InData

MAC driver

Page 29: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 57Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Bus Interface Controller (2)

S5

S4

S3

S1

S2

S6

OutAddr = BusAddr

OutData = BusData

OutCntrl = WRITE_WORDack = 1

ready = 1

ack = 0

ready = 0

ready = 1

ready = 0

ALU

Reg RF Mem

Bus 2

Bus 1

Statussignals

Controlsignals

/

CMem

AG

offset

Controller

Control

ack

Address

Write Queue

Read Queue

Datapath

Output logic

Input logic

Bus 3 Bus 4

Selector

Selector

DATA

ADDRESS

CONTROL

GRANT

REQUEST

Controlsignals

INC

ready OutCntrl OutAddr OutData InData

MAC driver

Bus protocol

7/8/2009 58Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Transducer/ Bridge

Processor1<clk1>

Processor2<clk2>

Controller1<clk1>

Controller2<clk2>

Transducer

Ready1Ack1

Ready2Ack2

Memory1 Memory2

PE1 PE2Bus 1 Bus 2

Interrupt2Interrupt1

Data1 Data2

Queue<clk3>

•Translates one protocol into another

•Controller1 receives data with protocol1 and writes into queue

•Controller2 reads from queue and sends data with protocol2

Page 30: Modeling, Synthesis, Verification - CECS · 2009. 7. 8. · Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner ... RTL Model Model

7/8/2009 59Embedded System Design © 2009: Gajski, Abdi, Gerstlauer, Schirner

Chapter 6: Hardware Synthesis

Conclusion

• Synthesis techniques• Variable Merging (Storage Sharing)

• Operation Merging (FU Sharing)

• Connection Merging (Bus Sharing)

• Architecture techniques• Chaining and Multi-Cycling

• Data and Control Pipelining

• Forwarding and Caching

• Scheduling• Metric constrained scheduling

• Interfacing• Part of HW component

• Bus interface unit

• If too complex, use partial order