vlsi programming 2016: lecture 1wsinmak/education/2imn35/2imn35-2016-slides1.pdf · vlsi...

70
19/04/16 1 VLSI Programming 2016: Lecture 1 Course: 2IMN35 Teachers: Kees van Berkel [email protected] Rudolf Mak [email protected] Lab: Kees van Berkel, Rudolf Mak, Alok Lele www: http://www.win.tue.nl/~wsinmak/Education/2IMN35/ Lecture 1: Introduction

Upload: dangnga

Post on 01-Sep-2018

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 1

VLSI Programming 2016: Lecture 1

Course: 2IMN35

Teachers: Kees van Berkel [email protected] Rudolf Mak [email protected]

Lab: Kees van Berkel, Rudolf Mak, Alok Lele

www: http://www.win.tue.nl/~wsinmak/Education/2IMN35/ Lecture 1: Introduction

Page 2: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 2

Introduction to VLSI Programming: goals

•  to acquire insight in the description, design, and optimization of fine-grained parallel computations;

•  to acquire insight in the (future) capabilities of VLSI as an implementation medium of parallel computations;

•  to acquire skills in the design of parallel computations and in their implementation on FPGAs.

Page 3: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 3

Contents

Massive parallelism is needed to exploit the huge and still increasing computational capabilities of Very Large Scale Integrated (VLSI) circuits:

•  we focus on fine-grained parallelism (not on networks of computers);

•  we assume that parallelism is by design (not by compilation);

•  we draw inspiration from consumer applications, such as digital TV, 3D TV, image processing, mobile phones, etc.;

•  we will use Field Programmable Arrays (FPGA) as fine-grained abstraction of VLSI for practical implementation.

Page 4: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 4

FPGA IC on a Xilinx XUP Board (Atlys)

Xilinx Spartan 6

FPGA

Page 5: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 5

Atlys board, based on Xilinx Spartan 6

Xilinx Spartan 6

FPGA

Page 6: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 6

Lab work prerequisites

•  Laptop, running Windows

•  Exceed (can be obtained through the TU/e software distribution)

•  Access to UNIX server Dept. W&I (can be obtained through BCF)

•  Lab work is by teams of two students, with at least 1 Windows laptop.

•  Have FPGA tools (SW) installed on your machine by Tuesday April 26

•  check website 2IMN35

Page 7: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 7

VLSI Programming (2IMN35): time table 2016 2015 in Tue:h5-h8;MF.07 out 2015 in Thu:h1-h4;Gemini-Z3A-08/10/13 out

19-Apr

introduc/on,DSPgraphs,bounds,…

21-Apr

pipelining,re/ming,transposi/on,J-slow,unfolding

T1+T2

26-Apr

toolsinstalled

Introduc/onstoFPGAandVerilog

L1:audiofiltersimula/on

L1L2

28-Apr

T1+T2

unfolding,look-ahead,strengthreduc/on

L1cntd

T3+T4

3-May

folding

L2:audiofilteronXUPboard

5-May

10-May

T3+T4

DSPprocessors

L2cntd

L3

12-May

L3:sequen/alFIR+strength-reducedFIR

17-May

L3cntd

19-May

L3cntd

L4

24-May

systoliccomputa/on

T5

26-May

L4

31-May

T5

L4:audiosamplerateconvertor

2-Jun

L3

L4cntd

L5

7-Jun

L5:1024xaudiosamplerateconvertor

9-Jun

L4

L5cntd

14-Jun

16-Jun

L5

deadlinereportL5

Page 8: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 8

Course grading (provisional)

Your course grade is based on:

•  the quality of your programs/designs [30%];

•  your final report on the design and evaluation of these programs (guidelines will follow) [30%];

•  a concluding discussion with you on the programs, the report and the lecture notes [20%];

•  intermediate assignments [20%].

•  Credits: 5 points = based on 140 hours from your side

Page 9: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 9

Note on course literature

Lectures VLSI programming are loosely based on: •  Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and

Implementation. Wiley Inter-Science 1999. •  This book is recommended, but not mandatory

Accompanying slides can be found on: •  http://www.ece.umn.edu/users/parhi/slides.html •  http://www.win.tue.nl/~wsinmak/Education/2IMN35/ Mandatory reading: •  Keshab K. Parhi. High-Level Algorithm and Architecture

Transformations for DSP Synthesis. Journal of VLSI Signal Processing, 9, 121-143 (1995), Kluwer Academic Publishers.

Page 10: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 10

Introduction

• Some inspiration from the technology side • VLSI • FPGAs

• Some inspiration from the application side • Machine Intellligence • Bee, SKA, SETI • Digital Signal Processing (Software Defined Radio)

• Parhi, Chapters 1, 2 • DSP Representation Methods • Iteration bounds

Page 11: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 11

Some inspiration

from the technology side

Page 12: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 12

Vertical cut through VLSI circuit

Page 13: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 13

Intel 4004 processor [1970]

§  1970

§  4-bit

§  2300 transistors

Page 14: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Apple A9 SoC (System on Chip)

• 2015

• Production: Samsung/TSMC

• 14/16 nm FinFet

• 96/104.5 mm2

•  > 2B transistors

• Assuming 0.1$/mm2 production costs

• ⇒ 5 nano$ / transistor

19/04/16 14

Page 15: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Flash memory

• 32 GB = 256Gb

• ≈100G transistors => << 1 n$ per transistor

19/04/16 15

Page 16: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Xilinx Kintex7 FPGA

• 2G transistors

• 165mm2

19/04/16 16

• 1920 DSP slices

Page 17: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Stratix 10 FPGA from Altera (Intel)

19/04/16 17

• > 10,000 FLOPs per clock cycle

• @ nearly 1 GHz

Page 18: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Exa-scale computing: 1018 FLOPs/Sec

A scenario (year 2021):

• 1018 FLOPs/sec = 109 arithmetic units running 109Hz

• 109 arithmetic units = 104.5 nodes ×104.5 arithmetic units

• 1 node = 32TFLOPs/s “X”+ 1TB DRAM + “CPU” @ 10 MW

Today (2016: “petaflop” era):

• #1: Tianhe-2 (China): 34 ×1015 FLOPs/sec 104.5 nodes @ 24 MW,

• GPU (Nvidia GM200): 6 TFLOPs/sec

• FPGA (Altera Stratix 10, GX2800): 9 TFLOPs/sec

19/04/16 18

Page 19: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

A 2016 “node”

19/04/16 19

Source: Samsung

Page 20: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 20

Source: NVidia

Page 21: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 21

Moore’s Law: 50th anniversary in 2015!

Page 22: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Cost per Transistor over Time for Intel MPUs

↑ US$

?

×0.5/2years

Page 23: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 23

Rule of two [Hu, 1993]

•  Every 2 generations of IC technology (6 years)

•  device feature size 0.5 x

•  chip size 2 x

•  clock frequency 2 x (no longer true)

•  number of i/o pins 2 x

•  DRAM capacity 16 x

•  logic-gate density 4 x

Page 24: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

ITRS: INTERNATIONAL TECHNOLOGY ROADMAP FOR SEMICONDUCTORS

• The overall objective of the ITRS is to present industry-wide consensus on the “best current estimate” of the industry’s research and development needs out to a 15-year horizon.

• As such, it provides a guide to the efforts of companies, universities, governments, and other research providers/funders.

• The ITRS has improved the quality of R&D investment decisions made at all levels and has helped channel research efforts to areas that most need research breakthroughs.

• Involves over 1000 technical experts, world wide.

• a self-fulfilling prophecy? … or wishful thinking?

19/04/16 ST-Ericsson confidential 24

Page 25: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

ITRS 2013

19/04/16 25

Page 26: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

2013 ITRSMPU/ASIC Half Pitch and Gate Length Trends

19/04/16 ST-Ericsson confidential 26

Page 27: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 27

Virtex 4 FPGA: 4VSX55 FPGA = Field Programmable Gate Array

500MHz clock Flexible Logic

6,144 CLBs

multi-port RAM 320 × 18 kbit

Programmable 512 DSP slides

450MHz PowerPC™

1Gbps Differential I/O

0.6-11.1Gbps Serial Trx

Page 28: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 28

Some inspiration

from the application side

Page 29: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 29

All things grand and small [Moravec ‘98]

Page 30: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 30

Chess Machine Performance [Moravec ‘98]

Page 31: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 31

brain power equivalent per $1000 of computer

Evolution computer power/cost [Moravec ‘98]

Page 32: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

MPSoC -- 2010, June 30 32

The Square Kilometer Array (SKA)

... the ultimate exploration tool

... and the ultimate software defined radio

Page 33: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

MPSoC -- 2010, June 30 33

The Square Kilometer Array (SKA)

• antenna surface: 1 km2 (sensitivity 50×)

• large physical extent (3000+ km)

• wide frequency range: 50 MHz – 30 GHz

• full design by 2016; phase 1: 2021; phase 2: 2026

• phase 1: 250 dishes (12m) in the central 5 km

• + dense and/or sparse aperture arrays

• connected to a massive data processor by an optical fibre network

• Software Defined Radio Astronomy

• computational load ≈ 1 exa FLOPs/sec (1018 FLops/s)

• power budget = 20 MW (≈ 20 pJ/FLOP “all-in”)

Page 34: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 34

References

•  Chip fotos: •  http://www-vlsi.stanford.edu/group/chips.html

•  ITRS Roadmap •  http://www.itrs.net/Links/2005ITRS/ExecSum2005.pdf

•  When will computer hardware match the human brain? •  http://www.jetpress.org/volume1/moravec.htm

•  BEE & Square Kilometer Array •  http://bwrc.eecs.berkeley.edu/Research/BEE/

•  http://seti.berkeley.edu/casper/papers/BEE2_ska2004_poster.pdf

•  http://www.skatelescope.org/

Page 35: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 35

VLSI Digital Signal Processing Systems

Parhi, Chapters 1&2

Page 36: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 36

DSP applications classes

10G 1G

100M 10M

1M

100k 10k

1k 100

10

1

speech audio

video

HDTV

modems

control seismic modeling

radio modems

complexity →

radar S

ampl

e ra

te [H

z]→

# operations/sample [log]

Page 37: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 37

Typical DSP algorithms

• speech (de-)coding

• speech recognition

• speech synthesis

• speaker identification

• Hi-fi audio en/decoding

• noise cancellation

• audio equalization

• ambient acoustic emulation.

• sound synthesis

• echo cancellation

• modem: (de-)modulation

• vision

• image (de-)compression

• image composition

• beam cancellation

• spectral estimation

• etc.

Page 38: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 38

Typical DSP kernels: FIR Filters

• Filters reduce signal noise and enhance image or signal quality by removing unwanted frequencies.

• Finite Impulse Response (FIR) filters compute y(n) :

• where • x is the input sequence

• y is the output sequence

• h is the impulse response (filter coefficients)

• N is the number of taps (coefficients) in the filter

• Output sequence depends only on input sequence and impulse response.

)(*)()()()(1

0nxnhkixkhiy

N

k=−= ∑

=

Page 39: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 39

Typical DSP kernels: IIR Filters

• Infinite Impulse Response (IIR) filters compute:

• Output sequence depends on input sequence, impulse response,as well as previous outputs

• Adaptive filters (FIR and IIR) update their coefficients to minimize the distance between the filter output and the desired signal.

∑∑−

=

=

−+−=1

0

1

1)()()()()(

N

k

M

kkixkbkiykaiy

Page 40: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 40

Typical DSP kernels: DFT and FFT The Discrete Fourier Transform (DFT) supports frequency

domain (“spectral”) analysis:

for k = 0, 1, … , N-1, where • x is the input sequence in the time domain (real or complex) • y is an output sequence in the frequency domain (complex)

The Inverse Discrete Fourier Transform (IDFT) is computed as

The Fast Fourier Transform (FFT) and its inverse (IFFT) provide an efficient method for computing the DFT and IDFT.

1 )()(21

0−===

−−

=∑ jeWnxWky N

j

N

N

n

nkN

π

1-n , ... 1, 0, n for ,)()(1

0== ∑

=

−N

k

nkN kyWnx

Page 41: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 41

Typical DSP kernels: DCT

The Discrete Cosine Transform (DCT) and its inverse IDCT are frequently used in video (de-) compression (e.g., MPEG-2):

where e(k) = 1/sqrt(2) if k = 0; otherwise e(k) = 1.

A N-Point, 1D-DCT requires N2 MAC operations.

1-N ... 1, 0, k for ,)(]2

)12(cos[)()(1

0=

+= ∑

=

N

nnx

Nknkeky π

1-N ... 1, 0, k for ,)(]2

)12(cos[)(2)(1

0=

+= ∑

=

N

kny

Nknke

Nnx π

Page 42: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 42

Typical DSP kernels: distance calculation

• Distance calculations are typically used in pattern recognition, motion estimation, and coding.

• Problem: chose the vector rk whose distance (see below) from the input vector x is minimum.

|)()(|1 1

0∑−

=

−=N

ik irix

Nd ∑

=

−=1

0

2)]()([1 N

ik irix

Nd

Mean Absolute Difference (MAD) Mean Square Error (MSE)

Page 43: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 43

Typical DSP kernels: matrix computations

Matrix computations are typically used to estimate parameters in DSP systems.

•  Matrix vector multiplication

•  Matrix-matrix multiplication

•  Matrix inversion

•  Matrix triangulization

Matrices may be dense/sparse/band-structured/….

Page 44: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 44

Computation Rates

• To estimate the hardware resources required, we can use the equation:

• where • Rc is the computation rate

• Rs is the sampling rate

• Ns is the (average) number of operations per sample

• For example, a 1-D FIR has NS = 2N and a 2-D FIR has NS = 2N2.

SSC NRR ⋅=

Page 45: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 45

Computational Rates for FIR Filtering

Signal type Frequency # taps Performance

Speech 8 kHz N =128 20 MOPs

Music 48 kHz N =256 240 MOPs

Video phone 6.75 MHz N*N = 81 1,090 MOPs

TV 27 MHz N*N = 81 4,370 MOPs

HDTV 144 MHz N*N = 81 23,300 MOPs

Page 46: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

DSP systems and programs

• infinite input stream (samples): x(0), x(1), x(2), …

• infinite output stream (samples): y(0), y(1), y(2), …

• (there may be multiple input and/or output streams)

• non-terminating program, e.g:

for n=1 to ∞ y(n) = a*x(n) + b*x(n-1) + c*x(n-2) end

19/04/16 46

DSP System

x(n) y(n)

Page 47: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

DSP SYSTEMSGRAPHICAL REPRESENTATIONS

19/04/16 47

Page 48: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

DSP systems: 3 graphical representations

• Block diagram: • general

• loose semantics

• Data-flow graph: • used for signal processing

• formal definition

• powerful tools , lots of theory

• Signal-flow graph: • linear time-invariant systems

• formal definition, stilll more theory

19/04/16 48

block diagram

general

data flow graph

signal processing

signal flow graph

LTI systems

Page 49: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 49

DSP system: block diagram

•  Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

•  delay element = memory element = register

•  multiply with constant a

•  adder: output value = sum of input values

× a × b × c

+ +

D D

y(n)

x(n) x(n-1) x(n-2)

D

× a

+

Page 50: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 50

DSP system: data-flow graph (DFG)

•  Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

•  D is (non-negative) number of delays

•  multiplier: output value = (constant a) × input value

•  adder: output value = sum of input values

a b c

y(n)

x(n)

a

D D

Page 51: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 51

Data-flow graph (DFG)

•  Consider FIR: y(n) = a*x(n) + b*x(n-1) + c*x(n-2)

Each edge describes a precedence constraint between two nodes:

•  D=0: Intra-iteration precedence constraint

•  D>0: Inter-iteration precedence constraint

a b c

y(n)

x(n) D D

Page 52: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Data-flow graphs

Tokens can represent numbers, vectors (blocks), matrices …

Nodes may be complex (coarse-grained) functions, e.g.:

Single-rate data flow: Each node:

• consumes one token from each input edge;

• performs its function (in T time units);

• produces one token onto each output edge.

19/04/16 52

Page 53: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Data-flow graphs

Multi-rate data flow: Each node:

• consumes a fixed number of tokens from each input edge;

• performs its function (in T time units);

• produces a fixed number of tokens onto each output edge.

19/04/16 53

Page 54: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Signal-flow graph (representation method 3)

• A join-node denotes an adder

• Label a next to an edge denotes multiplication by constant a • z-k denotes k units delay

• Signal-flow graphs are used to represent Linear Time Invariant systems LTI.

• A signal flow-graph represents a so-called Z-transform (Laplace), a powerful LTI system theory. (outside the scope of 2IN35)

19/04/16 54

Page 55: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 55

Linear Systems

input x, output y:

discrete system:

•  x(n) y(n)

linear system:

•  x1(n) + x2(n) y1(n) + y2(n)

•  c1 x1(n) + c2 x2(n) c1 y1(n) + c2 y2(n)

for arbitrary c1 and c2

Most of our examples will be linear systems

results in

results in

results in

Page 56: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 56

Linear Time-Invariant Systems

input x, output y:

•  x(n+k) = x(n) shifted by integer k sample periods

time-invariant system

•  x’(n) =x(n+k) y’(n) = y(n+k)

Most of our examples will be linear time-invariant systems,

or LTI systems

results in

Page 57: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 57

Commutativity of LTI systems

LTI System A

LTI System B

x(n) y(n) f(n)

LTI System B

LTI System A

x(n) y(n) g(n)

is equivalent to

Page 58: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

LOOP BOUNDS AND ITERATION BOUNDS

19/04/16 58

Page 59: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 59

Iteration of a Synchronous Flow Graph

• Each actor fires the minimum number of times to return the graph to a particular state

• Example of a multi-rate DFG:

A 1

B 2

C 3 2 2 1

# firings for 1 iteration A B C 2 2 3

# tokens per edge for 1 iteration

→ A A → B B → C C →

2 4 6 3

Page 60: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 60

Iteration period

Iteration period = the time required for the execution of one iteration of the SFG

Example, let

• Tm = 10 = multiplication time

• Ta = 4 = addition time

Iteration period = Tm+Ta = 14 [e.g. nsec]

= minimum sample period Ts; that is: Ts ≥ Tm+Ta

Iteration rate = (iteration period)-1 [e.g. GHz]

×

a

+ D y(n-1) x(n)

Page 61: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Loop and loop bound

• A loop (cycle) in a DFG is a directed path that begins and ends at the same node.

• The loop bound of loop j is defined Tj/Wj where • Tj is the loop computation time (sum of all Ti of loop nodes i ),

• Wj is the number of delays (D-elements) in the loop.

• Example (IIR filter): • Tloop = Tm+Ta = 14 ns

• Wloop = 2 • Loop bound

= Tloop /Wloop = 14 /2 =7 nsec

19/04/16 61

×

a

+ 2D y(n-2) x(n)

Page 62: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Critical loop and Iteration bound

• The critical loop of a DFG is the loop with the maximum loop bound.

• The iteration bound T∞ of a DFG is the loop bound of the critical loop: • L is the set of loops of the DFG

• Tj of is the loop bound of loop j • Wj of is the weight of loop j, i.e. the number of delays D.

19/04/16 62

T∞ =maxj∈L

TjWj

#

$%%

&

'((

Page 63: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Iteration bound cntd

Example:

• TL1 = (10+2)/1 = 12

• TL2 = (2+3+5)/2 = 5

• TL3 = (10+2+3)/2 = 7.5

• Iteration bound = max (12, 5, 7.5) = 12

Notes:

• Delays are non-negative (negative delay would imply non-causality).

• If loop weight equals 0 (no delay elements in loop) then TL/0 = ∞ (deadlock).

19/04/16 63

Page 64: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

4 types of delay paths; critical path

• Redraw block diagram by partitioning nodes in D-elements and combinational functions (“FSM view”):

• Paths do not contain delay-elements

• The critical path is the path with the longest computation bound and is an lower bound for the clock period.

19/04/16 64

delay elements = state

1 2

3

4

outputs inputs combinational functions

path from to 1 inputs state 2 state outputs 3 inputs outputs 4 state state

Page 65: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

Critical path cntd

Example (FIR filter): • Tm= 10 ns

• Ta= 4 ns

• No loops!

1.  1 path from input to state: 0 ns

2.  4 path from state to outputs: 26, 22, 18, 14 ns

3.  1 path from input to output: 26 ns

4.  3 paths from state to state: 0, 0, 0 ns

The critical path is 26 ns. (can be reduced by pipelining and parallel processing.)

19/04/16 65

Page 66: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 66

DSP references

•  Keshab K. Parhi. VLSI Digital Signal Processing Systems, Design and Implementation. Wiley Inter-Science 1999.

•  Richard G. Lyons. Understanding Digital Signal Processing (2nd edition). Prentice Hall 2004.

•  John G. Proakis and Dimitris K Manolakis. Digital Signal Processing (4th edition), Prentice Hall, 2006.

•  Simon Haykin. Neural Networks, a Comprehensive Foundation (2nd edition). Prentice Hall 1999.

Page 67: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 67

Computer Architecture and DSP references

•  Hennessy and Patterson, Computer Architecture, a Quantitative Approach. 3rd edition. Morgan Kaufmann, 2002.

•  Phil Lapsley, Jeff Bier, Amit Sholam, Edward Lee. DSP Processor Fundamentals, Berkeley Design Technology, Inc, 1994-199

•  Jennifer Eyre, Jeff Bier, The Evolution of DSP Processors, IEEE Signal Processing Magazine, 2000.

•  Kees van Berkel et al. Vector Processing as an Enabler for Software-Defined Radio in Handheld Devices, EURASIP Journal on Applied Signal Processing 2005:16, 2613-2625.

Page 68: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

VLSI Programming:

Preparations for Lab work, before Tuesday April 26:

• team up (2 students/team), and

• install FPGA tools.

19/04/16 68

Page 69: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

19/04/16 69

VLSI Programming: Thursday April 21

Transformations:

•  Transposition

•  Pipelining

•  Retiming

•  K-slow transformation

•  Parallel processing

(Parhi, Chapters 2, 3)

Page 70: VLSI Programming 2016: Lecture 1wsinmak/Education/2IMN35/2IMN35-2016-slides1.pdf · VLSI Programming 2016: Lecture 1 Course: ... • Parhi, Chapters 1, 2 • DSP Representation Methods

THANK YOU