adsd fall2011 05 architect ing speed 2011nov03

96
Dr. Rehan Hafiz <[email protected]> Lecture # 05

Upload: rehan-hafiz

Post on 06-Apr-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 1/96

Dr. Rehan Hafiz <[email protected]>Lecture # 05

Page 2: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 2/96

Course Website for ADSD Fall 2011

http://lms.nust.edu.pk/

2

Lectures: Tuesday @ 5:30-6:20 pm, Friday @ 6:30-7:20 pm 

Contact: By appointment/Email Office: VISpro Lab above SEECS Library  

Acknowledgement: Material from the following sources has been consulted/used in theseslides:1. [CIL] Advanced Digital Design with the Verilog HDL, M D. Ciletti2. [SHO] Digital Design of Signal Processing System by Dr Shoab A Khan3. [STV] Advanced FPGA Design, Steve Kilts4. Some slides from : [ECEN 248 Dr Shi]

Material/Slides from these slides CAN be used with following citing reference:

Dr. Rehan Hafiz: Advanced Digital System Design 2010

Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Page 3: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 3/96

This Lecture

3

Understanding & Optimizing

Speed

Throughput 

Timings

Reading Assignment 

Chapter -1: Advanced FPGA Design, by Steve Kilts

Xilinx Application Note Uploaded on MOODLE + Practice in Xilinx ISE

Setup/Hold time violation

Page 4: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 4/96

Speed 

4

Throughput 

Amount of data that is processed per clock cycle

Metric: bits/sec

Latency

Time between data input and processed data output 

Metric: No. of cycles or time

Timing Logic delays between sequential elements

Metric : Clock period or Frequency.

Page 5: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 5/96

High Throughput Design

 A high-throughput design

More concerned with the steady-state data rate

Less concerned about the time any specific piece of data

requires to propagate through the design (latency)

Techniques

Pipelining5

Page 6: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 6/96

Throughput 

D Q

clk

D Q

clk

Combinational

Logic

Combinational

Logic D Q

clk

top-level entity

Throughput = (bits per output sample) / (time between consecutive output samples)

Bits per output sample: In this example, 8 bits per output sample

Time between consecutive output samples: clock cycles between output(n) to output(n+1) Can be measured in clock cycles, then translated to time

In this example, time between consecutive output samples = 1 clock cycle = 10 ns

Throughput = (8 bits per output sample) / (10 ns) = 0.8 bits / ns = 800 Mbits/s

input output

clk

input input(0) input(1) input(2)

output (unknown) output(0) output(1)

8 bits8 bits

1 cycle betweeen

output samples

100MHz

Page 7: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 7/96

An Example...

Software Code

Digital Implementation

XPower = 1;

for (i=0;i < 3; i++)XPower = X * XPower;

Throughput 8/3 = 2.7 bits/cyc.

Latency 3 clk cycles

Timing 1 Multiplier Delay

Same register and computational resources

are reused

No new computations can begin until theprevious computation has completed

[KIL]

Page 8: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 8/96

Coding an iterative algorithm

<with dependency>

XPower = 1;

for (i=0;i < 3; i++)

XPower = X * XPower;

module power3(

output [7:0] XPower,

output finished,

input [7:0] X,

input clk, start);

reg [7:0] ncount;

reg [7:0] XPower;

assign finished = (ncount == 0);

always@(posedge clk)

if(start) begin

XPower <= X;

ncount <= 2;

End

else if(!finished) begin

ncount <= ncount - 1;

XPower <= XPower * X;

End

endmodule

Page 9: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 9/96

Loop Unrolling

9

XPower = 1;

for (i=0;i < 3; i++)

XPower = X * XPower;

Both the final calculation of X3 (XPower3

resources) and the first calculation of the

next value of X (XPower2 resources)occur simultaneously

x[n-1]2x[n] 

x[n-1] 

x[n-2]3

Page 10: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 10/96

Coding

10

module power3(

output reg [7:0] XPower,

input clk,input [7:0] X

);

reg [7:0] XPower1, XPower2;

reg [7:0] X1, X2;

always @(posedge clk) begin

// Pipeline stage 1

X1 <= X;

XPower1 <= X;

// Pipeline stage 2

X2 <= X1;

XPower2 <= XPower1 * X1;// Pipeline stage 3

XPower <= XPower2 * X2;

end

endmodule

X2

XPower1 XPower2

X1

Page 11: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 11/96

ft 

11

Throughput 8/1 = 8 bits/cyc.

Latency 3 clk cycles

Timing 1 Multiplier Delay

Throughput 8/3 = 2.7 bits/cyc.

Latency 3 clk cycles

Timing 1 Multiplier Delay

Page 12: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 12/96

12

In general, if an algorithm requiring n

iterative loops is “unrolled,” the pipelined

implementation will exhibit a throughput 

performance increase of a factor of n. The penalty for unrolling an iterative loop is a

proportional increase in area.

Page 13: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 13/96

Decreasing Latency

 A low-latency design is one that passes the data from

the input to the output as quickly as possible by 

minimizing the intermediate processing delays.

Technique

Removal of pipelining, and logical short cuts that may reduce

the throughput or the max clock speed in a design

Parallelisms

13

Page 14: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 14/96

Latency

D Q

clk

D Q

clk

Combinational

Logic

Combinational

Logic D Q

clk

top-level entity

Latency is the time between input(n) and output(n)

i.e. time it takes from first input to first output, second input to second output, etc.

Also called input-to-output latency

Count the number of rising edges after input 

In this example, 2 rising edges latency is 2 cycles

Latency is measured in clock cycles (then translated to seconds)

In this example, say clock period is 10 ns, then latency is 20 ns

input output

clk

input input(0) input(1) input(2)

output (unknown) output(0) output(1)

8 bits8 bits

100 MHz

Page 15: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 15/96

Removal of pipelining

Throughput 8/1 = 8 bits/cyc.

Latency Less than a cycle

Timing 2 Multiplier Delays

Page 16: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 16/96

Penalty

16

Penalty in timing

Previousimplementationscould theoretically

run the system clock period close to thedelay of a singlemultiplier

For Low-latency

implementation, theclock period must beat least two multiplierdelays

module power3(

output [7:0] XPower,

input [7:0] X

);

reg [7:0] XPower1, XPower2;

reg [7:0] X1, X2;assign XPower = XPower2 * X2;

always @* begin

X1 = X;

XPower1 = X;

end

always @* beginX2 = X1;

XPower2 = XPower1*X1;

end

endmodule

Page 17: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 17/96

Understanding Timing

17

Page 18: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 18/96

Timings

18

Combinational

Logic & Routing

Flip Flops

Setup time

Hold time

Propagation delay t CLK2Q 

Page 19: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 19/96

Page 20: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 20/96

Timing: Flip Flops (Sequential Logic)

D Qclk

clk

D

Q

tS tH

Input D must remain

stable during

this interval

Input D can freely

change during

this interval

tCLK2Q

Setup time t S – minimum time the input has to be stable before the rising edge of the clock 

Hold time t H – minimum time the input has to be stable after the rising edge of the clock 

Propagation delay t CLK2Q – time to propagate input to output after the rising edge of the clock 

Ti i

Page 21: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 21/96

Timing:

Path timing

D Q

clk

clk

D Q

clk

CombinationalLogic

tCLK2Q ts

CLOCK PERIOD T 

tLOGIC

t CLK2Q + t LOGIC+ t ROUTING < (T - t S ) to avoid setup time

violation

Rewriting the equation: t CLK2Q + t LOGIC + t routing + t S < T

t path 

tRout

A path is defined as a path from the output 

of one flip-flop to the input of another

flip-flop 

Page 22: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 22/96

Critical Path Delay

Path delay t path = t CLK2Q + t LOGIC + t ROUTE + t S

The largest of all the path delays in a circuit is

called the critical path delay (t critical_path)

The associated path is called the critical path

There can be millions of paths in a circuit; timing

analysis CAD tools help to locate the critical path

Page 23: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 23/96

Critical PathD Q

D Q

D Q

D Q

D Q

PATH 1

PATH 2

PATH 3

PATH 4

1.1 ns

0.5 ns

0.8 ns

Path delays: t path1 = 2.2 ns, t path2 = 1.1 ns, t path3 = 3.0 ns, t path4 = 1.4 ns

The critical path is path 3; the critical path delay is t critical_path

= t path3

=3.0 ns

t CLK2Q=0.4 ns

t CLK2Q=0.4 ns

t CLK2Q=0.4 ns

t S=0.2 ns

t S=0.2 ns

Page 24: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 24/96

Setup Time Violation (a.k.a Critical Path Violation)

D Q D Q

t S=0.2 nst CLK2Q=0.4 ns

clk

tCLK2Q ts

CLOCK PERIOD T 

CombinationalGate A

CombinationalGate B

t wire1=0.4 ns t gateA=2.0 ns t wire2=0.2 nst gateB=1.2 ns t wire3=0.8 ns

t wire1 t gateA t wire2 t gateB t wire3 

Critical path delay = t critical_path = 5.2 ns

The minimum period for this circuit to work is Tmin = 5.2 ns

Maximum clock frequency = 1/Tmin = 192 MHz

If the clock period is smaller than Tmin, you will get a timing violation and circuit will not operate correctly!!

This kind of timing violation is called a "setup time" violation (also known as critical path violation)

Page 25: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 25/96

 

25

Page 26: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 26/96

Review – From Last Lecture

26

Throughput 

 Amount of data that is processed per clock cycle OR The aggregate/average data

processing rate

Ideally average data rate IN to your system should be able to the average data rate OUT of 

your system – OR you will miss data !

Technique : Pipelining & Loop Unrolling !

Streaming Applications – More concerned with throughput !

Metric: bits/sec

Latency

Time between data input and processed data output 

Parallelising the system ---

Response Time --- Important for Time Critical Signals, e.g. some interrupt triggered

operation processing an external signal of an avionics system ! Metric: No. of cycles or time

Normally a compromise !

Page 27: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 27/96

Timing

27

Timing

Logic delays between sequential elements

Metric : Clock period or Frequency.

[t CLK2Q + t LOGIC + t routing + t S ]< T

Clock Skew Rising Edge of the Clock Does Not Arrive at Clock Inputs of All Flip-flops at The

Same Time

Page 28: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 28/96

Clock Skew

Delay often caused by wire routing delay

D Q

in

clk

D Q

out

delay

D Qin

clk

D Qout

delay

clk'

clk

clk'

tskew

clk

clk'

tskew

Lag clock skew

Lead clock skew

Page 29: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 29/96

29

Positive slack  When the data arrives at the capture flip-flop before the capture

clock less the setup time.

Negative Slack 

If the data arrive after the capture clock less the setup time -ve slack is an issue 

d l k k b d b

Page 30: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 30/96

Lead clock skew is bad because it may cause setup

time violations

D Q

clk

D Q

clk

Combinational

Logic

clk

tCLK2Q ts

CLOCK PERIOD T 

tLOGIC+tROUTE

D Q

clk

D Q

clk'

Combinational

Logic

clk

tCLK2Q

ts

clk'

CLOCK PERIOD T 

tskew

WITHOUT SKEW:

t CLK2Q + t LOGIC + t ROUTE + t s < T

to avoid setup time violation

WITH SKEW:

t CLK2Q + t LOGIC + t ROUTE + t s < (T – t skew)

to avoid setup time violation

less time to perform logic than you

normally would

Soln: Optimize/Pipeline/Speedgrade !

tLOGIC+tROUTE

l k k b d b h ld

Page 31: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 31/96

Lag clock skew is bad because it may cause hold

time violations

D Q

clk

D Q

clk'

Combinational

Logic

clk

tCLK2Q tLOGIC+Route

clk'

tskew tH

t CLK2Q + t LOGIC + t ROUTE > (t skew + t H ) to avoid hold time violation

If this is violated, get data feedthrough (data gets fed into the next register one cycle too early)

There is no clock period (T) in the equation; changing clock period cannot help this problem!

Solution : Add dummy logic, e.g. Buffer !

For FPGAs hold time violation predict clock skew

Page 32: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 32/96

Maximum Achievable Frequency

Maximum-frequency equation (ignoring clock-

to-clock jitter):

Tskew is propagation delay of clock between

the launch flip- flop and the capture flip- flop 

-ve,+ve depends on lead or lag

Page 33: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 33/96

Reading Assignment 

33

Page 34: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 34/96

Some Examples34

Example 1:

Page 35: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 35/96

Example 1:

Analyzing Sequential Circuits

° What is the minimum time between rising clock 

edges?• Tmin = TCLK-Q (FFA) + TLogic (G) + TRoute (G) + Ts (FFB)

ZComb.Logic

TClk-Q = 5 nsTs = 2 ns

D  Q  D  Q  Y XD

CLK 

TClk-Q = 5ns  Tlogic+Route = 5ns 

FFA FFB

G

Example: 2

Page 36: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 36/96

Example: 2Hold Time Violation

° Shall we get Hold Time Violation in this example ?

° Make sure Y remains stable for hold time (Th) after rising clock edge

° Remember: contamination delay ensures signal doesn’t change• TCLK2Q(FFA) + Tcd(G) >= Th

• 1ns + 2ns > 2ns

ZComb.Logic

Th = 2 ns

D  Q  D  Q  Y XD

CLK 

Tclk2Q = 1ns  Tcd = 2ns 

FFA  FFB

G

E l 3

Page 37: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 37/96

Example-3

° What is the minimum clock period (Tmin) of thiscircuit?

° What if FFB has a clock skew  –  Lead of 1 ns

ZComb.

Logic H

TClk-Q = 4 ns

Ts = 2 ns

D  Q  D  Q  Y X

CLK 

TClk-Q = 5ns 

Tlogic+Route = 5ns FFA  FFB

Comb.Logic F

Togic+Route= 4ns 

S l i

Page 38: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 38/96

Solution

° Path FFA to FFB• TClk-Q(FFA) + Tpd(H) + Ts(FFB) = 5ns + 5ns + 2ns = 12ns

° Path FFB to FFB• TCLK-Q(FFB) + Tpd(F) + Tpd(H) + Ts(FFB) = 4ns + 4ns + 5ns + 2ns = 15ns 

ZComb.Logic H

TClk-Q = 4 nsT

s

= 2 ns

D  Q  D  Q  Y X

CLK 

TClk-Q = 5ns 

Tlogic+Route = 5ns FFA  FFB

Comb.

Logic F

Tlogic+Route = 4ns 

Page 39: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 39/96

Solution(With Lead of 1 ns for FFB)

° Path FFA to FFB• TClk-Q(FFA) + Tpd(H) + Ts(FFB) + Tskew= 5ns + 5ns + 2ns + 1ns= 13ns

° Path FFB to FFB• TCLK-Q(FFB) + Tpd(F) + Tpd(H) + Ts(FFB) = 4ns + 4ns + 5ns + 2ns = 15ns 

ZComb.Logic H

TClk-Q = 4 nsT

s

= 2 ns

D  Q  D  Q  Y X

CLK 

TClk-Q = 5ns 

Tlogic+Route = 5ns FFA  FFB

Comb.

Logic F

Tlogic+Route = 4ns 

Example

Page 40: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 40/96

Example

Analyzing Sequential Circuits: Hold Time Violations

Path FFA to FFB• TClk2q(FFA) + Tlogic+Route (H) > Th(FFB) = 1 ns + 2ns > 2ns

Path FFB to FFB• TClk2q (FFB) + TCD(F) + Tlogic+Route (H) > Th(FFB) = 1ns + 1ns + 2ns > 2ns 

Comb.Logic H

Tclk2Q = 1 nsTh = 2 ns

D  Q  D  Q  Y X

CLK Tclk2Q = 1ns 

Tlogic+Route = 2ns FFA  FFB

Comb.Logic F

Tlogic+Route = 1ns 

All paths must satisfy requirements

Page 41: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 41/96

Optimizing TimingFew Simple Design Considerations

41

Page 42: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 42/96

Consider an FIR Filter

The equation for the computation of an L-taps

FIR filter is:

If L=5 y[0]= h0x0 + h1x-1 + h2x-2 + h3x-3 +h4x-4

y[1]= h0x1 + h1x0 + h2x-1 + h3x-2 +h4x-3

y[2]= h0x2 + h1x1 + h2x0 + h3x-1 +h4x-2  y[3]= h0x3 + h1x2 + h2x1 + h3x0 +h4x-1 

y[4]= h0x4 + h1x3 + h2x2 + h3x1 +h4x0 

y[5]= h0x5 + h1x4 + h2x3 + h3x2 +h4x1 

Page 43: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 43/96

Parallel FIR Implementation

43

module fir(

Page 44: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 44/96

44

Critical Path ??

output [7:0] Y,

input [7:0] A, B, C, X,

input clk,

input validsample);

reg [7:0] X1, X2, Y;

always @(posedge clk)

if(validsample) begin

X1 <= X;

X2 <= X1;

Y <= A* X+B* X1+C* X2;

endendmodule

Technique-1- Pipelining

Page 45: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 45/96

Technique 1 Pipelining

<Reducing TLOGIC+PROPAGATON>

reg [7:0] X1, X2, Y;

Page 46: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 46/96

Code

46

reg [7:0] prod1, prod2, prod3;

always @ (posedge clk) begin

if(validsample) begin

X1 <= X;

X2 <= X1;

prod1 <= A * X;

prod2 <= B * X1;

prod3 <= C * X2;

end

Y <= prod1 + prod2 + prod3;

endendmodule

Technique-2- Increasing Parallelism

Page 47: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 47/96

Technique 2 Increasing Parallelism

<Speeding-up the logic-process>47

…. Optimize the critical path such that logic

structures could be implemented in parallel

Example:

For the x-cube code break the multipliers intoindependent operations and then recombine them.

Page 48: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 48/96

Taking a square

48

8-bit binary multiplier

8 Muxe shifts +8 8-bit 

Additions

b l l

Page 49: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 49/96

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

0 0 0 0 0 0 0 0

1 1 1 1 1 0 1 0

0 0 0 0 0 0 0 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 0 1 0 0 0 0 1 0 0 1 0 0

8 bit Multiplication

Page 50: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 50/96

50

Optimizing Logic by adding

Page 51: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 51/96

Optimizing Logic by adding

Parallelism51

Assume we are squaring an 8-bit number

can be represented by nibbles A and B:

a3 a2 a1 a0 b3 b2 b1 b0

Page 52: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 52/96

a3 a2 a1 a0 b3 b2 b1 b0

a3b0 a2b0 a1b0 a0b0 b3b0 b2b0 b1b0 b0b0

a3b1 a2b1 a1b1 a0b1 b3b1 b2b1 b1b1 b0b1

a3b2 a2b2 a1b2 a0b2 b3b2 b2b2 b1b2 b0b2

a3b3 a2b3 a1b3 a0b3 b3b0 b2b0 b1b0 b0b0

a0a3 a0a2 a0a1 a0a0 a0b3 a0b2 a0b1 a0b0

a1a3 a1a2 a1a1 a1a0 a1b3 a1b2 a1b1 A1b0

a2a3 a2a2 a2a1 a2a0 a2b3 a2b2 a2b1 a2b0

a3a3 A3a2 a3a1 a3a0 a3b3 a3b2 a3b1 a3b0

B*B

2*A*B

 A*A

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

Page 53: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 53/96

1 1 1 1 1 0 1 0

0 0 0 0 0 0 0 0

1 1 1 1 1 0 1 0

0 0 0 0 0 0 0 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

1 1 1 1 1 0 1 0

‘0’ 1 1 0 0 1 0 0

‘1’ 0 0 1 0 1 1 0 0

1 1 1 0 0 0 0 1

1 1 1 1 0 1 0 0 0 0 1 0 0 1 0 0

Page 54: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 54/96

Technique-3- Register Balancing

Page 55: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 55/96

Technique-3- Register Balancing <Distribute long logic paths evenly across register layers>

55

Keep a balance in the critical path

Redistribute logic evenly between registers to

minimize the worst-case delay between any two

registers

Page 56: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 56/96

56

Technique-4- Flatten Logic Structures

Page 57: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 57/96

Technique 4 Flatten Logic Structures

<Removing redundant logic>57

Break up logic structures that are coded in a

serial fashion

 Avoiding Priority Structures if not required

control signals coming from an

Page 58: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 58/96

control signals coming from an

address decode that are used to write four 1-bit registers

58

module regwrite(

output reg [3:0] rout,

input clk, in,

input [3:0] ctrl);

always @(posedge clk)

if(ctrl[0]) rout[0] <= in;

else if(ctrl[1]) rout[1] <= in;

else if(ctrl[2]) rout[2] <= in;

else if(ctrl[3]) rout[3] <= in;

endmodule

Page 59: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 59/96

59

If the control lines are strobes from an address

decoder in another module

Each strobe is mutually exclusive to the others as

they all represent a unique address. Is there any need for priority structure ?

Page 60: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 60/96

60

module regwrite(

output reg [3:0] rout,

input clk, in,

input [3:0] ctrl);

always @(posedge clk) begin

if(ctrl[0]) rout[0] <= in;

if(ctrl[1]) rout[1] <= in;

if(ctrl[2]) rout[2] <= in;

if(ctrl[3]) rout[3] <= in;

end

endmodule

Ti

Page 61: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 61/96

Tip

61

Technique-5- Reordering Paths

Page 62: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 62/96

q g

<Shortening Critical Paths>62

Mostly done by synthesizer !!!

Reorder the paths in the dataflow to minimize

the critical path

When to use: Where multiple paths combine with the critical path

The combined path can be reordered such that the

critical path can be moved closer to the destination

register

Technique-5- Reordering Paths

Page 63: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 63/96

q g

63

Events not 

mutually

exclusive

module randomlogic(

output reg [7:0] Out,

input [7:0] A, B, C,

input clk,

input Cond1, Cond2);always @(posedge clk)

if(Cond1)

Out <= A;

else if(Cond2 && (C < 8))

Out <= B;

else

Out <= C;

endmodule

module randomlogic(output reg [7:0] Out,

Page 64: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 64/96

64

output reg [7:0] Out,

input [7:0] A, B, C,

input clk,

input Cond1, Cond2);

wire CondB = (Cond2 & !Cond1);

always @(posedge clk)

if(CondB && (C < 8))

Out <= B;

else if(Cond1)

Out <= A;

else

Out <= C;

endmodule

Summary Architecting Speed

Page 65: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 65/96

Summary- Architecting Speed

65

High Throughput  Pipelining

Low Latency Parallelism

Pipeline Removal Timing

Parallelism

Pipelining

Flattening Logic Structure

Register Balancing Path Reordering

In your digital design Make your specification as your goal and apply the

techniques

Recap

Page 66: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 66/96

66

 A high-throughput architecture is one that maximizes the number of bits per second that can be

processed by a design.

Unrolling an iterative loop increases throughput.

The penalty for unrolling an iterative loop is a proportional increase in area.

 A low-latency architecture is one that minimizes the delay from the input of a module to the output.

Latency can be reduced by removing pipeline registers

The penalty for removing pipeline registers is an increase in combinatorial delay between registers.

Timing refers to the clock speed of a design. A design meets timing when the maximum delay between any

two sequential elements is smaller than the minimum clock period  Adding register layers improves timing by dividing the critical path into two paths of smaller delay.

Separating a logic function into a number of smaller functions that can be evaluated in parallel reduces

the path delay to the longest of the substructures.

By removing priority encodings where they are not needed, the logic structure is flattened, and the path

delay is reduced.

Register balancing improves timing by moving combinatorial logic from the critical path to an adjacent 

path

Timing can be improved by reordering paths that are combined with the critical path in such a way that 

some of the critical path logic is placed closer to the destination register

Page 67: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 67/96

Dr. Rehan Hafiz <[email protected]>

Reading

Chapter 3 of Parhi, VLSI Digital Signal Processing Systems

Reading

Page 68: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 68/96

Reading

Parhi, VLSI Digital Signal Processing Systems

Chapter 3

Direct Form FIR Filters

Page 69: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 69/96

x(n) Z-1 Z-1 Z-1 

h0 h1 h2 hM-1

y(n)

Direct Form FIR Filters

M-tap FIR filter in direct form

Critical path:

TA = delay through adder

TM = delay through multiplier

Critical path delay: 1 TM +(M-1) TA 

Area: M-1 registers

M multipliers

M-1 adders

Arithmetic complexity of M-tap filter modeled as:

M multiplications/sample + M-1 adds/sample

1

0

( ) ( ) ( ) ( ) ( )

 M 

i

 y n h i x n i h n x n

Representations of DSP algorithms and architectures

Page 70: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 70/96

70

Block Diagram

Block diagram of a 3-tap FIR filter

Representations of DSP algorithms and architectures

Page 71: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 71/96

71

Signal Flow Graph – Representation ! 

Signal Flow Graph of a 3-tap FIR filter

Collection of Nodes & Directed Edges

 A directed edge (j,k) denotes a node

originating at node j & terminating

at node k 

Edge (j,k) denotes a linear

transformation from signal at node j

to signal at node k – Can specify Gain

Nodes represent computations or

tasks e.g: Addition

Source Node : No input edges; Sink 

Node : No originating edges

Technique Signal Flow Graph

Page 72: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 72/96

Signal Flow Graph

From Direct Form to Transpose Form

72

Reversing the direction of an SFG and interchanging the input and output ports preserves the functionality of the system.

Also called data broadcast structure

x(n)

Z-1 Z-1 Z-1

hM-1 hM-2 hM-3 h0

y(n)

Critical path:

Delay: 1 TM + 1 TA 

Area:

M-1 registers + M multipliers +M-1 adders

Disadvantages

Larger register sizes depending on quantization scheme used; since registers are now placed aftermultiplication !

Fanout of x(n) can become prohibitive

Representations of DSP algorithms and architectures

Page 73: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 73/96

73

Data Flow Graph – Representation ! 

Data Flow Graph of a 3-tap FIR filter

•Nodes represent Computations/tasks: e.g: Addition, Multiplication

•Computational time for a node can be specified with the node

•Edges have a non-negative no. of delays associated with it 

•A node shall only compute once all the input data is ready

•Non Recursive DFG Systems have no loops in a DFG

Consider this example !

Page 74: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 74/96

Consider this example !

74

Technique : DFG based Pipelining ! Data Flow Graphs (DFGs)

Page 75: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 75/96

Data Flow Graphs (DFGs)

75

Some Terms

Technique : DFG based Pipelining ! Data Flow Graphs (DFGs)

Page 76: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 76/96

Data Flow Graphs (DFGs)

DFG based Pipelining – Example (1/4) 

76

Technique : DFG based Pipelining ! Data Flow Graphs (DFGs)

Page 77: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 77/96

Data Flow Graphs (DFGs)

DFG based Pipelining – Example (2/4) 

77

Technique : DFG based Pipelining ! Data Flow Graphs (DFGs)

Page 78: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 78/96

Data Flow Graphs (DFGs)

DFG based Pipelining – Example (3/4) 

78

x(n) Z-1 Z-1 Z-1 

h0 h1 h2 hM-1

Z-1 

Z-1 

x(n) Z-1 Z-1 Z-1 

h0 h1 h2 hM-1Put delay onall cuts

Z-1 

Technique : DFG based Pipelining ! Data Flow Graphs (DFGs)

Page 79: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 79/96

Data Flow Graphs (DFGs)

DFG based Pipelining – Example (4/4) 

79

Let Tm = 10 units, Ta = 2 units, Desied clock = 6 units !

Initial Design be:

x(n)

Z-1 Z-1 Z-1

hM-1 hM-2 hM-3 h0

y(n)

x(n)

Z-1 Z-1 Z-1

hM-1 hM-2 hM-3 h0

y(n)

Z-1 

insert 

registers

here

Fine Grained Pipelining

Pipelining using the Delay Transfer Theorem

Page 80: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 80/96

Feedforward – only (Example-1)

80

A convenient way to implement pipelining is to add the desired number of registers to all input edges and then, by repeated application of the node

transfer theorem, systematically move the registers to break the delay of the

critical path.

Functionality is not changed if a register is transferred from all incoming

edges of node (e.g. FA0) to all outgoing edges & vice versa ! 

Article : 7.2.7 [SHO]

Pipelining using the Delay Transfer Theorem

Page 81: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 81/96

Feedforward – only (Example-2) 

81

Page 82: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 82/96

Pipelining using the Delay Transfer Theorem

Page 83: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 83/96

83

This scheme can also be applied for RegisterBalancing (as discussed earlier)

Page 84: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 84/96

Technique : DFG based Parallel Processing 

Data Flow Graphs (DFGs)84

Technique : DFG based Parallel Processing 

D Fl G h (DFG )

Page 85: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 85/96

Data Flow Graphs (DFGs)

85

What if we can’t optimize our system anymore using pipelining ? 

Convert a SISO system to a MIMO system using parallel logic !

The effective sampling speed is increased by the level of parallelism: L

Multiple outputs are computed in parallel in a clock period

Parallel processing system is also called block processing, and the number of inputs processed in a clock 

cycle is referred to as the block size : L 

Page 86: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 86/96

SISO to MIMO Conversion !

Page 87: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 87/96

SISO to MIMO Conversion !

87

Page 88: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 88/96

Page 89: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 89/96

2 Parallel 3-Tap Filter !

Page 90: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 90/96

2 Parallel 3 Tap Filter !

90

Page 91: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 91/96

91

Combining Parallelism & Pipelining

Page 92: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 92/96

g p g

92

By combining parallel processing (block size: L)and pipelining (pipelining stage: M), the sample

period can be reduced to:

Technique : Parallel Processing + Pipelining

Example : FIR Filtering !

Page 93: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 93/96

Example : FIR Filtering !

Quiz ...

Page 94: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 94/96

Q

94

Time – 8 Minutes !

Page 95: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 95/96

95

Q-1) What is the maximum sampling rate of this system without any optimization?

Q-2) Optimize this design such that the sampling rate of the optimized system is 1/T.

(You must show the DFG for the optimized design) ---

Please assume that computational time required for each node = T

Also Assume that all nodes are atomic !!

Solution !

Page 96: ADSD Fall2011 05 Architect Ing Speed 2011Nov03

8/3/2019 ADSD Fall2011 05 Architect Ing Speed 2011Nov03

http://slidepdf.com/reader/full/adsd-fall2011-05-architect-ing-speed-2011nov03 96/96

96

Q-1) What is the maximum sampling rate of this system without any optimization?

Sampling Period = 4T, Sampling Rate = 1/4T

Please assume that computational time required for each node = T

Also Assume that all nodes are atomic !!