eng2410 the von neumann computer digital design...

10
School of Engineering 1 ENG2410 Digital Design “Programmable Logic Technologies” Fall 2017 S. Areibi School of Engineering University of Guelph 2 Week #12 Topics The Von Neumann Architecture What is Programmable Logic? Classification of Programmable Logic Field Programmable Gate Arrays FPGA CAD Applications Summary 3 Resources Chapter #10, Mano Sections 10.3 Programmable Implementations Tech 4 Principle In 1945, the mathematician Von Neumann (VN) demonstrated in study of computation that a computer could have a simple structure, capable of executing any kind of program, given a properly programmed control unit, without the need of hardware modification The Von Neumann Computer ENIAC - The first electronic computer (1946) 5 Structure An arithmetic and logic unit (ALU) also called data path for program execution A control unit (control path) featuring a program counter for controlling program execution A memory for storing program and data. The memory consists of the word with the same length Datapath Control Unit Processor or Central processing unit Data and Instructions Address register Memory Instruction register PC Data Address Registers The Von Neumann Computer 6 The Von Neumann Computer Coding A program is coded as a set of instructions to be sequentially executed Program execution Instruction Fetch (IF): The next instruction to be executed is fetched from the memory Decode (D): Instruction is decoded (operation?) Read operand (R): Operands read from the memory Execute (EX): Operation is executed on the ALU Write result (W): Results written back to the memory Instruction execution in Cycle (IF, D, R, EX, W) What is the problem with this computing paradigm?

Upload: ngokhuong

Post on 05-Mar-2018

222 views

Category:

Documents


3 download

TRANSCRIPT

School of Engineering 1

ENG2410Digital Design

“Programmable Logic Technologies”

Fall 2017S. Areibi

School of EngineeringUniversity of Guelph

2

Week #12 Topics

� The Von Neumann Architecture

� What is Programmable Logic?

� Classification of Programmable Logic

� Field Programmable Gate Arrays

� FPGA CAD

� Applications

� Summary

3

Resources

� Chapter #10, Mano Sections� 10.3 Programmable Implementations Tech

4

� Principle

In 1945, the mathematician Von Neumann (VN)

demonstrated in study of computation that a

computer could have a simple structure,

capable of executing any kind of program,

given a properly programmed control unit,

without the need of hardware modification

The Von Neumann Computer

ENIAC - The first electronic computer (1946)

5

� Structure

� An arithmetic and logic unit (ALU) also called data path for program execution

� A control unit (control path) featuring a program counter for controlling program execution

� A memory for storing program and data.

� The memory consists of the word with the same length

Datapath

Control Unit

Processor orCentral processing unit

Dataand

Instructions

Addressregister

Memory

Instructionregister PC

Data

Address

Registers

The Von Neumann Computer

6

The Von Neumann Computer

� Coding

A program is coded as a set of instructions to be

sequentially executed

� Program execution

� Instruction Fetch (IF): The next instruction to be executed is fetched from the memory

� Decode (D): Instruction is decoded (operation?)

� Read operand (R): Operands read from the memory

� Execute (EX): Operation is executed on the ALU

� Write result (W): Results written back to the memory

� Instruction execution in Cycle (IF, D, R, EX, W)

What is the problem with this computing paradigm?

School of Engineering 2

Bottlenecks in VN Architecture

7

8

The Von Neumann Computer

� Advantage:

� Simplicity.

� Flexibility: any well coded program can be executed

� Drawbacks:

� Speed efficiency: Not efficient, due to the sequential program execution (temporal resource sharing).

� Resource efficiency: Only one part of the hardware resources is required for the execution of an instruction. The rest remains idle.

� Memory access: Memories are about 5 times slower than the processor

� How to compensate for deficiencies?

9

Improving Performance of VN (GPPs)

1. Technology Scaling� Improve performance (increase clock frequency!)

2. Improving Instruction Set of Processor3. Application Specific Processors (DSP)4. Use of Hierarchical Memory System

� Cache can enhance speed

5. Multiplicity of Functional Units (H/W)� Adders/Multipliers/Dividers (CDC-6600)

6. Pipelining within CPU (H/W)� A four stage pipeline stage (IF/ID/OF/EX)

7. Overlap CPU & I/O Operations (H/W) � DMA (Direct Memory Access) can be used to enhance performance

8. Time Sharing (SW)� Multi-tasking assigns fixed or variable time slices to multiple programs

9. Parallelism & Multithreading (S/W) (H/W)� Compilers/Multi-core systems

ENGG3380ENGG4540

10

Spatial vs. Temporal Computing

(Ax + B)x + C

Temporal (Processor)

Von Neumann Architecture

11

Spatial vs. Temporal Computing

Ax2 + Bx + c (Ax + B)x + C

Spatial (ASIC or FPGA) Temporal (Processor)

ENGG3050

Von Neumann Architecture

Temporal vs. Spatial Based Computing

Temporal-based execution(software)

Spatial-based execution(reconfigurable computing)

Ability to extract parallelism (or concurrency) from algorithm descriptions is the key to acceleration using reconfigurable computing

12

School of Engineering 3

Estrin at work.

Substantial effortson Reconfiguration

Gerald Estrin Fix-Plus Machine

� Attempts to have a flexible hardware structure that can be dynamically modified at run-time to compute a desired function are almost as old as the development of other computing paradigms.

� In 1959, Gerald Estrin, at UCLA, introduced the concept of reconfigurable computing by introducing the Fix-Plus Machine.

14

Programmable Logic I

ProgrammableOr Array

ProgrammableAND array

� We learnt in the first part of this course that any combinational logic circuit can be implemented with the sum of min-terms (SOP).

� If we can control the number of AND gates to be used and also control the inputs to the OR gate then we can design a programmable logic circuit.

� Remember when we used a decoder to implement any Boolean function! That was some type of implementing programmable logic!

15

I. Programmable AND Array

o If we remove fuses Faf and Fbt this will disconnect the complementary

version of input ‘a’ and the true version of input ‘b’.

o This leaves the device to perform its new function � y = a AND b’

o The process of removing fuses is typically referred to as programming

the device (blowing, burning the device).

o Devices based on fusible-link technology are said to be One Time

Programmable (OTP).

o Remember: FPGAs are not based on this type of technology.

16

Decoders: Implementing Logic

� Example: Implement the following boolean functions 1. S(A2,A1,A0) = SUM(m(1,2,4,7))

1. Since there are three inputs, we need a 3-to-8 line decoder.

2. The decoder generates the eight minterms for inputs A0,A1,A2

3. An OR GATE forms the logical sum minterms required.

17

II. Programmable OR Array

Programmable Boolean Functions

Multiplexers can also be used to realize Boolean functions since they consist of an array of AND gates followed by an OR gate.

18

School of Engineering 4

19

Classification of PLDs

ProgrammableOr Array

ProgrammableAND array

ProgrammableOr Array

ProgrammableAND array

20

Classification

The first programmable

ICs were generically

referred to as (PLDs).

Programmable Logic Devices

Simple PLDs Complex PLDs

21

Programmable Logic Array (PLA)

Like programmable

inverterTied to 0 – F1

not invertedTied to 1 – F1 is

inverted

1

2

34

The integration of several Simple PLD blocks with a programmable interconnect on a single chip � CPLD

PLDBlock

PLDBlock

Interconnection MatrixI/O

Block

I/O B

lock

PLDBlock

PLDBlock

I/O B

lock

I/O B

lock

•••

Interconnection Matrix

•••

•••

•••

Complex PLDs (CPLDs)

22

23

III. SRAM FPGAs:

Memory units can be used to implement a Boolean function by storing the output of the truth table in the memory and accessing the values by using variables of the truth table as address lines.

A B C D Z

0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0

A B C D Z

0 0 0 0 0 0 0 0 1 1 0 0 1 0 1 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 0 0 0 0 1 0 0 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 0 1 1 1 0 0

LUT

ABCD

Z

LUT implementation

AB

CD

Z

Gate implementation

24

Generic FPGA architecture:Configurable Logic Block (CLB) � LUT + FF

Connection Block

Switch Block

Routing Channels

I/O pad

Wire segments

School of Engineering 5

25

SRAM based Programmable Cell

o There are two main versions of semiconductor RAM devices:o Dynamic RAM (DRAM) and

o Static RAM (SRAM).

o SRAM based devices can be used to control NMOS transistors to be on/off.

o This can be very useful to control Multiplexers, Routing, e.t.c.

26

Pass Transistor

o An SRAM cell can drive the gate (G) terminal of an NMOS transistor.

o If SRAM (M) = 1 then signals passes from S � D

o An SRAM cell can be attached to the select line of a MUX to control it.

27

Look Up Table (LUT)

o The LUT is used to realize any boolean function.

o Assume the function to be realized is y = (a&b) | !c

o This could be achieved by loading the LUT with the appropriate output values

28

Configurable Logic Block (CLB)

A Configurable logic block consists of lookup table (LUT), a register that could act as flip flop or a latch, and a mulitplexer, along with a few other elements.

29

Xili

nx C

LB

30

Switch Matrix

o Connections between CLBs and IOBs are made using wiring segments in both horizontal and vertical channels lying between the various blocks.

o Four segments meet, on each there is 6 pass transistors.

School of Engineering 6

31

Xilinx IOB

32

Design Entry

Logic Optimization

Synthesis

Mapping to k-LUT

Packing LUTs to CLBs

Placement

Routing Configure an FPGA

Simulation

CAD for FPGAs

33

CAD for FPGAs: Place & Route

Design Entry

Logic Optimization

Synthesis

Mapping to k-LUT

Packing LUTs to CLBs

Placement

Routing Configure an FPGA

Simulation

34

f2

f3

f1

Programming an FPGA?

A

B

C

DEF

f1

f2 f3

ABC

DEF

Technology Mapping

Placement

Routing

35

FPGA Placement Problem• Input – A technology mapped netlist of Configurable

Logic Blocks ( CLB ) realizing a given circuit.

• Output – CLB netlist placed in a two dimensional array of slots such that total wirelength is minimized.

CLB Netlist

i1 i2 i3 i4

f1 f2

1 2 3

4 5 6 7 8

9 10

FPGA

Placement

i1 i2 i3

i4

f2

f1

1

2

3

4

56

7

8

9

10

36

Global vs. Detailed Routing

� Global routing

LB LB LB

SB SB

LB LB LB

SB SB

LB LB LB

SB

SB

LB LB LB

SB SB

LB LB LB

SB SB

LB LB LB

SB

SB

� Detailed routing

School of Engineering 7

37

Remember!

Program

mable

Lookup Tables (LUTs)

Program

mable

routing structure

Main bottleneck with state-of-the-art fine grain FPGAs is the routing enabled by pass transistors!

38

Remember!

Program

mable

Lookup Tables (LUTs)

Program

mable

routing structure

LUTxyz f

...

fSRAM

x

y

z...

001

0

...

1

Look-up-tables are flexible but require lots of configuration and suffer from power dissipation!

39

Fine Grain FPGAs: Spartan2

o 4K bit RAM blocks

o Large amount of logic

o Program stored in SRAM

40

Medium Grain: Xilinx Virtex• Virtex-II FPGA introduced followed by Virtex-II Pro in 2003

– 444 18x18 Multipliers & 18kbit block RAMs introduced– Gbit Serial I/O Communications & Power PC Processors Introduced– Complex Floating Point Algorithm Implementation now possible

• Virtex-II / Pro– 44,000 Logic Slices– 444 18Kbits BRAMs– 444 18x18 Multipliers– 2 PowerPC

Processors– 20 Gbit I/O– 1164 Max User I/O

Zynq - Extensible Processing Platform

41

Configuration Port or ICAP

Configuration Port

Dynamic Partial Reconfiguration� Partial Reconfiguration is the ability to dynamically modify blocks of logic

while the remaining logic continues to operate without interruption.� Computation sequences are not know at compile time . The system decides,

respectively reacts dynamically to application driven reconfiguration requests.

Full

Bit File

Partial

Bit Files

Fu

nctio

n A

1

Fu

nctio

n B

1

Fu

nctio

n C

1F

un

ction

C2

Fu

nctio

n B

2

Fu

nctio

n A

2F

un

ction

A3

42

School of Engineering 8

Methods for executing algorithms

Advantages:•very high performance and efficient

Disadvantages:•not flexible (can’t be altered after fabrication)

• expensive

Hardware(Application Specific Integrated Circuits)

Software-programmedprocessors

Advantages:•software is very flexible to change

Disadvantages:•performance can suffer if clock is not fast

•fixed instruction set by hardware

Reconfigurablecomputing

Advantages:•fills the gap between hardware and software

•much higher performance than software

•higher level of flexibility than hardware 43

44

Reconfigurable Devices

Reconfigurable Devices (RD) are usually used in many different ways:

1. Rapid Prototyping

2. Non-frequent reconfigurable systems

3. Frequently reconfigurable systems

4. High Performance Computing (Acceleration of Complex Algorithms

45

1. Rapid prototyping

� Testing hardware in real conditions before fabrication

� Software simulation

� Relatively inexpensive

� Slow

� Accuracy ?

� Hardware emulation

� Hardware testing under real operation conditions

� Fast

� Accurate

� Allow several iterations

APTIX System Explorer

ITALTEL FLEXBENCH

46

2. Non-Frequent Reconfiguration

47

3. Frequently Reconfigured

Computing systems that are able to adapt their behaviourand structure to changing operating and environmental conditions, time-varying optimization objectives, and physical constraints like changing protocols, new standards, or dynamically changing operation conditions of technical systems

48

4. Algorithm Acceleration

Real Time Video Processing - Single Precision Floating

Point calculations

-36 GFlops + 40 GOPs

sustained Performance on

a single PCI card

- >200 times Power

reduction over Xeon

Gravity Simulation - N-Body computation

- Single Precision FloatingPoint

- 20GFlops/sec sustainedperformance

-100 times faster than2.4GHz Pentium 4 CPU

School of Engineering 9

49

fMRI and Real-time Human Body Imaging

• Technique for determining which parts of the brain are activated by different types of physical sensation or activit y – “brain mapping”

• High- and low-resolution scans compared using numero us FFTs– Typically post-processed– Much error correction needed due to subject movemen t– 3D data representation requires a good deal of conv entional processing

• Studying how RC devices can achieve real-time proce ssing

Figures c/o University of Oxford, UK

50

Image Registration

• In computer vision , sets of data acquired by sampling the same scene or object at different times, or from di fferent perspectives, will be in different coordinate syste ms.

• Image registration is the process of transforming t he different sets of data into one coordinate system.

• Registration is necessary in order to be able to co mpare or integrate the data obtained from different measurem ents.

51

Biomechanical Kinematics • Knee-joint simulation *

– Build a generic model to predict human movement (ju mping, walking, etc)– Used to study joint replacement stresses without ri sking patient injury

– Biomechanical simulations frequently use costly opt imization methods– Studying how RC-based parallel processing can incre ase performance

Figures c/o UF Computational Biomechanics Lab

52

Satellite Imaging• Satellite imaging used for mapping, environmental s tudies

and defense applications• High-data rate and low-power demands of space requi re

cutting-edge technology such as RC to provide requi red processing capabilities

• Including RC devices in the processing chain will eventually enhance performance

GMTI processing chain

c/o LANLc/o LANL

c/o US Air Force

Adaptive Integrated Driver Vehicle Interface

…Towards a safe use of on board Support Systems and Services: The AIDE Integrated Project

AIDE Integrated Project’s OEMs : Volvo, CRF, PSA, Renault, DaimlerChrysler, Ford, BMW, SEAT, OPEL

CRF Demonstrator Vehicle

Reconfigurable LCD Display

Microphone

Radar sensorfor Frontal Collision Warning

Compact PCfor Curve Warning

Data processing unit for Frontal Collision Warning

GPS antenna

Vehicle server PC for DVE

Image Processing Unit for Lane Departure Warning

Industrial PC for ICA, HMI, Speech I/O, Navigation, etc.

Car Radio / CD

USB MP3 player

Navigation System

BT link to Nomadic Devices

Real time controller for Gateway

CMOS Camera for Lane Departure Warning

Sensor box for Curve Warning,Navigation, DVE

Haptic barrel key

53

ITS

Driving Assistance - Information Support

www.seeingmachines.com

School of Engineering 10

55

Summaryo Programmable logic comes in different flavors such as PLDs,

CPLDs and FPGAs.

o Field Programmable Gate Arrays is a technology introduced in the late 80’s to allow Engineers to implement their design without the need to fabricate the chip as we do in Application Specific Integrated Circuits (ASICs).

o The main components of an FPGA are the CLBs, IOBs and programmable interconnect (Fine Grain FPGAs).

o New technologies of FPGAs include Block Memory, Processors, Multipliers (we start to call these Coarse Grain FPGAs)

o Applications of FPGAs in HPC, Embedded Systems, Cars, Appliances, … (Endless ..)

57

Programming

• Programming the PLA can be specified in tabular form

• 3 sections,

1. product terms,

2. input and AND gates,

3. Outputs