enabling technologies for reconfigurable computing enabling technologies for reconfigurable...

Post on 14-Dec-2015

220 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Enabling Technologies for

Reconfigurable Computing

Enabling Technologies for Reconfigurable Computing Part 2:Stream-based Computing for RC

Wednesday, November 21, 10.30 – 12.00 hrs.

Reiner Hartenstein

University ofKaiserslautern

November 21, 2001, Tampere, Finland

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de2

University of Kaiserslautern

Xputer Lab

Schedule

time slot

08.30 – 10.00

Reconfigurable Computing (RC)

10.00 – 10.30

coffee break

10.30 – 12.00

Stream-based Computing for RC

12.00 – 14.00

lunch break

14.00 – 15.30

Resources for RC

15.30 – 16.00

coffee break

16.00 – 17.30

FPGAs: recent developments

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de3

University of Kaiserslautern

Xputer Lab>> EDA revolution

• EDA revolution • Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de4

University of Kaiserslautern

Xputer LabEDA: where Electronics begins

[Richard Newton]

1k

•Dataquest InitiativeNew book

• NASDAQ index

EDA index

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de5

University of Kaiserslautern

Xputer Lab

[Richard Newton]

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de6

University of Kaiserslautern

Xputer LabThe End is near

year to market10 0

103

10 6

109

1012

1015

1960 1970 1980 1990 2000 2010 2020 2030 2040

transistors/chip

x1.6/year

The end of Hypergrowth ?

x100/decade

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de7

University of Kaiserslautern

Xputer Lab

Paradigm Shift

Mainstream

Tornado

Development of Hypergrowth Markets

Harper Business 1995

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de8

University of Kaiserslautern

Xputer Lab

Makimoto’s 3rd wave

The next EDA Industry Revolution

1978

Transistor entry: Applicon, Calma, CV ...

1992Synthesis: Cadence, Synopsys ...

1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]

EDA industry paradigmswitching every 7 years

1999(Co-) Compilation

Stream-based DPU arrays

[Hartenstein]

2006

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de9

University of Kaiserslautern

Xputer Lab Biggest Mistake in History

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de10

University of Kaiserslautern

Xputer LabInnovation Stalled ?[Richard Newton]

What is next after VHDL ?

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de11

University of Kaiserslautern

Xputer Lab What is next after VHDL ?

Motivations• HDL-savvy designers needed• New Business Model• Co-Design never ending• HDLs ?• Extended HDLs – how far ?• Automatic Partitioning

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de12

University of Kaiserslautern

Xputer Lab>> Dead Supercomputer

• EDA revolution

• Dead Supercomputer• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de13

University of Kaiserslautern

Xputer Lab Dead Supercomputer Society

• 37 university and corporate R&D projects: 2 or 3 successes…

• All the rest failed to work or to be successful (Research 1985-1995)

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de14

University of Kaiserslautern

Xputer Lab

Dead Supercomputer Society

• ACRI • Alliant • American

Supercomputer • Ametek • Applied Dynamics • Astronautics • BBN • CDC• Convex• Cray Computer • Cray Research • Culler-Harris • Culler Scientific • Cydrome • Dana/Ardent/

Stellar/Stardent• DAPP

• Denelcor • Elexsi • ETA Systems • Evans and Sutherland• Computer• Floating Point Systems • Galaxy YH-1 • Goodyear Aerospace MPP • Gould NPL • Guiltech • ICL • Intel Scientific Computers • International Parallel

Machines • Kendall Square Research • Key Computer Laboratories

[Gordon Bell, keynote at ISCA 2000].

•MasPar•Meiko •Multiflow •Myrias •Numerix •Prisma •Tera •Thinking Machines •Saxpy •Scientific Computer•Systems (SCS) •Soviet Supercomputers •Supertek •Supercomputer Systems •Suprenum •Vitesse Electronics

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de15

University of Kaiserslautern

Xputer Lab Dead Supercomputer Society• ACRI• Alliant• American Supercomputer• Ametek• Applied Dynamics• Astronautics • BBN• CDC • Convex• Cray Computer • Cray Research• Culler-Harris • Culler Scientific• Cydrome • Dana/Ardent/Stellar/Stardent• DAP (ICL) • Denelcor • Elexsi • ETA Systems• Evans and Sutherland Computer• Floating Point Systems • Galaxy YH-1

• Goodyear Aerospace MPP • Gould NPL• Guiltech • Intel Scientific Computers • International Parallel Machines• Kendall Square Research • Key Computer Laboratories• MasPar • Meiko • Multiflow • Myrias • Numerix • Prisma • Tera• Thinking Machines • Saxpy • Scientific Computer Systems (SCS) • Soviet Supercomputers• Supertek • Supercomputer Systems• Suprenum • Vitesse Electronics

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de16

University of Kaiserslautern

Xputer Lab>> Stream-based

Computing

• EDA revolution

• Dead Supercomputer

• Stream-based Computing• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de17

University of Kaiserslautern

Xputer LabCoarse Grain Reconfigurable Arrays

vs. Parallel Processes

I-Seq ALU

I-Seq ALUI-Seq ALU

I-Seq ALU I-Seq ALU

I-Seq ALU

I-Seq ALUI-Seq ALU

•••

• • •

I-Seq ALU

• • •

• • •

• • •

• • •

• • •

• • •

DataSequencer

rALU rALU rALU

rALU rALU rALU

rALU rALU rALU

Paralellität auf Prozeß-Ebene Paralellität auf Datenpfad-Ebene

Parallelism at Process Level

Parallelism at Datapath Level

reconfigurablehardwired no

instruction sequencing

!

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de18

University of Kaiserslautern

Xputer Lab Concurrent Computing

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

DPUinstructionsequencer

....

Bus(es) or switch box

CPUextremely inefficient

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de19

University of Kaiserslautern

Xputer Lab Stream-based Computing

DPU DPUDPUDPU

driven by data stream from / to memory or, from / to peripheral interface

transport-triggered executionno instruction sequencer inside !

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de20

University of Kaiserslautern

Xputer LabStream-based Computing: (r)DPU

array

for both,reconfigurable,and, hardwired

DPU DPUDPU

DPU DPUDPU

DPU DPUDPU

driven by data streams

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de21

University of Kaiserslautern

Xputer Lab>>> extremely high efficiency

• avoiding address computation overhead

• avoiding instruction fetch and interpretation

overhead

• high parallelism, massively multiple deep pipelines

• much less configuration memory

• no routing areas to configure functions from CLBs

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de22

University of Kaiserslautern

Xputer LabSystolic Stream-based Computing

SystemSystolic Array [H. T. Kung, 1980]: an array of DPUs (Data Path Units)

y10

y20

y30

x1

x2

x3

-

-

-

a12

a11 a21

a32

a31

a23 a33

a22

a13

--

y1

y2

y3

---

-

DPU architecturey

+*

x

a

datastreams

equations

placement linearprojection

or algebraicmapping

The Mathematician’s

Synthesis Method

linear pipelinesand uniformarrays only

norouting!

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de23

University of Kaiserslautern

Xputer Lab

computingin space

Computing in space and time

datastreams

y10

y20

y30

---

y1

y2

y3

---

x1

x2

x3

-

- -

computingin time

a12

a11 a21

a32

a31

a23 a33

a22

a13

placement

systolicarrays etc.

and other transformationsmigration by re-timing

this dichotomy iscompletely ignoredby our CS curricula

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de24

University of Kaiserslautern

Xputer Lab

2

General Stream-based Computing Systemheterogenous Array of DPUs (data path units)

Scheduler

Mapper

expression treeDPU architectures

y

+*

x

a

1

simultaneousplacement& routing

3

+

++

+

***sh

*sh

sh sh

xf

xf

-

- datastreams

4

The same mapper for both:Reconfigurable,or hardwired

Kress DPSS [1995]

simulated

annealing

free form

pipe network

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de25

University of Kaiserslautern

Xputer LabConverging Design Flows

this synthesis method is a generalization of

systolic array synthesis:super systolic synthesis

and DPA [Broderson,

2000]: terms:

DPU: datpath unitDPA: data path arrayrDPU: reconfigurable DPUrDPA: reconfigurable DPA

the same synthesis method may be used for mapping an algorithm

onto both:rDPA [Kress, 1995],

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de26

University of Kaiserslautern

Xputer Lab

Super Pipe Networks

pipeline properties array applications

shape resources

mapping scheduling

(data stream formation)

systolic array

regular data dependencies

only

linear only

uniform only

linear projection or algebraic synthesis

super-systolic rDPA

no restrictions simulated

annealing or P&R algorithm

(e.g. force-directed) scheduling algorithm

The key is mapping, rather than architecture

**) KressArray [1995]

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de27

University of Kaiserslautern

Xputer Lab>> Stream-based Memory

Architecture

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de28

University of Kaiserslautern

Xputer LabHot Research Topic: Memory Architectures

•High Performance Embedded Memory Architectures

•High Performance Memory Communication Architectures [Herz]

•Custom Memory Management Methodology [Cathoor]

•Data Reuse Transformations [Kougia et al.]

•Data Reuse Exploration [Soudris, Wuytak]

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de29

University of Kaiserslautern

Xputer LabProcessor Memory Performance Gap

1

10

100

1000Performance

1980 1990 2000

µProc60%/yr..

DRAM7%/yr..

Processor-MemoryPerformance Gap:(grows 50% / year)

DRAM

CPU

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de30

University of Kaiserslautern

Xputer LabRAs: Cache does not help

• the memory bandwidth problem is often more dramatic then for microprocessors

• interleaving is not practicable, since based on sequential instruction streams

• classical caches do not help, since instruction sequencing is not used

• the problem: throughput of parallel data streams, not instruction streams

• super pipe networks, no parallel computers !

• Stream-based arrays are a memory bandwidth problem

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de31

University of Kaiserslautern

Xputer Lab

http://kressarray.de

Efficient Memory Communicationshould be directly supported by the Mapper Tools

sequencers

memory ports

application

not used

Legend:Optimized ParallelMemory Controller

An example byNageldinger’s KressArray Xplorer

Synthesizable Memory Communication

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de32

University of Kaiserslautern

Xputer LabThe Disk Farm? or

a System On a Card?

The 500GB disc cardLOTS of bandwidthA few disks replaced by >10s Gbytes RAM and a processor

14"

MicroDrive:1.7” x 1.4” x 0.2” 2006: ?

1999: 340 MB, 5400 RPM, 5 MB/s, 15 ms seek

2006: 9 GB, 50 MB/s ? (1.6X/yr capacity, 1.4X/yr BW)

Integrated IRAM processor2x height

Connected via crossbar switchgrowing like Moore’s law

16 Mbytes; ; 1.6 Gflops; 6.4 Gops10,000+ nodes in one rack! 100/board = 1 TB; 0.16 Tflops

[Gordon Bell, Jim Gray,

ISCA2000]

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de33

University of Kaiserslautern

Xputer LabMemory Communication Architecture

• hot research topic in embedded systems

• storage context transformations [Herz, others]

• for low power

• for high performance

• startups provide memory IP or generators

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de34

University of Kaiserslautern

Xputer LabStream-based Soft Machine

SchedulerMemory(data memory)

memory bank

memory bank

memory bank

memory bank

memory bank

...

...

“instructions”

rDPACompiler

Sequencers(data stream

generator)

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de35

University of Kaiserslautern

Xputer Lab>> Design Space Explorers

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers • KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de36

University of Kaiserslautern

Xputer Lab

• domain-specific Reconfigurable Platforms will be suitable to cope with the 2nd Design Crisis

• just as the general purpose massively parallel computer system

general purpose is unrealistic

an Illusion ...

KressArray Explorer ...

• fully general purpose reconfigurable sometimes is ....

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de37

University of Kaiserslautern

Xputer Lab Universal RAs: is it feasible?

... such as obviously also the Universal Massively Parallel Computer Architecture... counter-example: Application Domain of Image Processing

The General Purpose (coarse grain)

Reconfigurable Array appears to be an Illusion

...

Motivatio

n

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de38

University of Kaiserslautern

Xputer Lab -> Design Space Exploration

• Design Space Exploration– Design Space Explorer (DSEs) – Platform Space Explorers (PSEs)– Compiler / PSE symbiosis– Parallel computing vs. reconfigurable

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de39

University of Kaiserslautern

Xputer LabDesign Space Exploration Systems

Explorer System year source inter-active

status evaluation status generation

DPE 1991 [66] no abstract models rule-based

Clio 1992 [67] yes prediction models device generator

DIA 1998 [68] yes prediction from library rule-based

DSE for RAW 1998 [49] no analytical models analytical

ICOS 1998 [76] no fuzzy logic greedy search DSE for Multimedia

1999 [77] no simulation branch and bound

Xplorer 1999 [11] [50] yes fuzzy rule-based simulated annealing

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de40

University of Kaiserslautern

Xputer LabDSEs: an overview

• For VLSI design in general• for parallel Computer Systems• Xplorer the only one for

reconfigurable platforms (auch MATRIX ?)

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de41

University of Kaiserslautern

Xputer Lab>> KressArray Xplorer

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de42

University of Kaiserslautern

Xputer Lab KressArray DPSS

ApplicationSet

DPSS

published at ASP-DAC 1995

ArchitectureEditor

MappingEditor

statist.Data

DelayEstim.

Analyzer

Architecture

Estimator

interm.form 2

expr.tree

ALE-XCompiler

PowerEstimator

PowerData

VHDLVerilog

HDLGeneratorSimulator

User

ALEXCode

Improvement Proposal Generator

Suggestion

SelectionUserInterface

interm.form 3

Mapper

DesignRules

DatapathGeneratorGenerator

KressrDPU

Layout

data stream Schedule

Scheduler

KressArrayXplorer (Platform Design Space Explorer)

Xplorer

InferenceEngine (FOX)

Sug-gest-ion

KressArrayfamily

parameters

Compiler

Mapper

Scheduler

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de43

University of Kaiserslautern

Xputer Lab

Architecture &Mapping Editor

Stat

istics

KressArray DPSS

DatastreamGenerator

HDLGeneratorSimulator

DatapathGeneratorGenerator

Delay & Power

EstimatorImprovement

ProposalGenerator

User DPSS

SourceInputKressArray

(Design Space)Platform SpaceExplorer

http://kressarray.de

Xplorer

ApplicationSet

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de44

University of Kaiserslautern

Xputer Lab Design Flow of Domain-specific

Architecture Optimization

ApplicationCompilation

ApplicationSelection

ApplicationMapping

MappingAnalysis

ModificationSuggestion

ArchitectureModification

ArchitectureVerification

OptimizedArchitecture

ApplicationSet

Initial Arch.Estimation

or benchm ark

Nageldinger’s KressArray

Design Space Xplorer:

including aFuzzy LogicImprovementProposalGenerator

accessible by internet:

http://kressarray.de

runs best withNetscape 4.6.1

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de45

University of Kaiserslautern

Xputer Lab KressArray Design Space Xplorer

DPSS-NData Path Systhesis System

Analyser

HDL Generator HDLDescription

.v

Module Generator

.krs

Kress IPLibrary

other IP

Editor /User Interface

ArchitectureEstimation

IntermediateFormat

.map

ALE-XCompiler

ALE-XCode

.alex

User

Mapper

Interm.Format

.map

includingconfigwarecode

Technology Mapping

SchedulerData

.seq SequencingCode

KressrDPU.krsLayout

Placement & Routing

Map

pin

g

StatisticalData

.stat

to SynthesisEnvironment

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de46

University of Kaiserslautern

Xputer Lab >> Machine paradigms

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de47

University of Kaiserslautern

Xputer Lab

datacounter

instructions

programcounter:

state register

CompilerMemory

Datapath

hardwired

Sequencer

Computer Computer tightly coupledby compact

instruction code

“von Neumann”

“von Neumann”does not supportsoft data pathsdoes not supportsoft data paths

Datapath

reconfigurable

Xputer Xputer

Scheduler

CompilerMemory

multiplesequencer

DatapathArray

“instructions”

University of Kaiserslautern

Xputer Lab

loosely coupledby decision data bits only

Xputer:Xputer:The Soft Machine Paradigm

The Soft Machine Paradigm reconfigurablereconfigurable

also for hardwiredalso for hardwired

Computer:the wrong Machine Paradigm

“von Neumann”

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de48

University of Kaiserslautern

Xputer LabSoft Machine Paradigm

Xputer Xputer Parallel Xputer Parallel Xputer

reconfigurable

Scheduler

CompilerMemory

SequencerDatapath

“instructions”

datacounter

Scheduler

Compiler

SequencerDatapath

Sequencer

“instructions”

datacounters reconfigurable

mem

ory

mem

ory

••••

multiple

Decision data only; i, e, loose coupling

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de49

University of Kaiserslautern

Xputer Lab Computer:the wrong Machine

Paradigm

CompilerMemory

Sequencer

DecoderDatapath

instructions

programcounter

hardwired

tightly coupledby a compactinstruction code“von

Neumann”

“von Neumann”does not supportsoft data paths:does not supportsoft data paths:

“von Neumann”

at run time: no instruction fetchat run time: no instruction fetch

:InstructionSequencer

Datapath

reconfigurable

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de50

University of Kaiserslautern

Xputer LabMachine Paradigms

machine categoryComputer

(“v. Neumann”)Xputer

(no transputer!)

driven by: control flow data streams (no “dataflow”)

engine principles instruction sequencing data sequencing

state register program counter (multiple) data counter(s)

communicationpath set-up

at run time at load time

resource single ALU array of ALUs & other rDPUsdatapath operation sequential parallel pipe network

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de51

University of Kaiserslautern

Xputer Lab Machine Paradigms

machine categoryComputer

(“v. Neumann”)Xputer [8]

(no transputer!)Machine paradigm procedural sequencing: deterministic

driven by: control flow(no dataflow [13])

data stream(s)RA support no yesengine principles Instruction sequencing data sequencing

state register program counter (multiple) data counter(s)communicationpath set-up

at run time at load time

resource single ALU array of ALUsdatapath operation sequential parallel

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de52

University of Kaiserslautern

Xputer LabFundamental Ideas available

• Data Sequencer Methodology

• Data-procedural Languages (Duality w. v. N.)

• ... supporting memory bandwidth optimization

• Soft Data Path Synthesis Algorithms

• Parallelizing Loop Transformation Methods

• Compilers supporting Soft Machines

• SW / CW Partitioning Co-Compilers

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de53

University of Kaiserslautern

Xputer Lab >> Co-Compilation

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de54

University of Kaiserslautern

Xputer LabFPGA-Style Mapping for coarse

grain reconfigurable arrays

mapping Kress DPSS CHESS RaPiD Colt

placement simulated annealinggenetic

algorithm

routing

simulatedannealing

Pathfindergreedy

algorithm

Compiler

Mapper

Schedulerspecifies and

assembles thedata streams

from / to array

DPSS

KressArray DPSS(Datapath Synthesis System)

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de55

University of Kaiserslautern

Xputer Lab Changing Models of Computing

“von Neumann”

downloading

RAM

downloading

data path instructionsequencer

I / O

(procedural)Software

contemporary

host

hardwired

downloading

accelerator(s)

CAD

RAM

reconfigurablecomputing

host

re-

downloading

conf.accelerator(s)

RAM RAM

SoftwareConfigware

both done at customer siteHardware

designer neededdone at

vendor site

ASICs

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de56

University of Kaiserslautern

Xputer LabChanging Models of Computation

contemporaryhost

hardwired

Compiler

accelerator(s)

CAD

RAM

reconfigurablecomputing

host

re-

Co-Compiler

conf.accelerator(s)

RAM RAM

SoftwareConfigware

Machine

paradigm

Machine paradigm

EDA tools

needed*

ASICs

*) even 80% hardware people hate their tools

both done at customer sitedone at

vendor site

no hardware

experts needed

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de57

University of Kaiserslautern

Xputer Lab

Processor

Co-Compilation

partitioning compiler

Computer

Machine Paradigm

Software running on

Xputer

“Soft” Machine Paradigm

Configware running onGNU C

compiler Analyzer/ Profiler

Hardware / Software Co-Design turnsto Configware / Software Co-Design

supportingdifferentplatforms

Resource Parameters

inte

rfac

e

X-Ccompiler

ReconfigurableAcceleratorsKressArray

DPSS

high level programming language sourceX-C

Partitioner

Jürgen Becker’s Co-DE-X Co-Compiler[ASP-DAC’95]

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de58

University of Kaiserslautern

Xputer Lab

Co-Compilation

Xputer

“Soft” Machine Paradigm

Configware running on

partitioning compiler

high level programming language source

Processor ReconfigurableAcceleratorsin

terf

ace

Reconfigurable Architecture (RA)

-- instead of hardwired

no CAD !

Compilation

instead !

Hardware / Software Co-Design turnsto Configware / Software Co-Design

We introduce: Co-Compilation

Computer

Machine Paradigm

Software running on

Xputer

“Soft” Machine Paradigm

Configware running on

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de59

University of Kaiserslautern

Xputer LabJürgen Becker’s Co-DE-X Co-Compiler

Analyzer/ Profiler

host

GNU Ccompiler

paradigmComputer machine

DPSSKressArray

X-Ccompiler

Xputer machineparadigm

Partitioner

Loop

Transfor-

mationsX-C is C languageextended by MoPLX-C

Resource Parameters

supportingdifferentplatforms

supporting platform-based design

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de60

University of Kaiserslautern

Xputer LabLoop Transformation

Examples

loop 1-8bodybodyendloop

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop

loop 1-2triggerendloop

loop 1-8triggerendloop

reconf.array:host:loop 1-16bodyendloop

sequential processes: resource parameter drivenCo-Compilation

loop unrolling

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de61

University of Kaiserslautern

Xputer LabHistory of Loop

TransformationsDavid Loveman, 1977, Allen and Kennedy, et

al.

Loop Unrolling, Loop Fusion, Strip Mining ....

• (Parameter-driven) Time to Time/Space Partitioning1995/97 [Karin Schmidt / Jürgen Becker]: downto Datapath Level:

e. g.: Transformation from Sequential Process to Super-systolic

• Multi-dimensional Loop Unrolling / Storage Scheme Optimization supporting burst-mode & parallel Memory Banks

2000 [Michael Herz]: optimized RA to Memory Communication Bandwidth:

70ies - 80ies: at Process Level:• Sequential to Parallel Processes, incl. Vectorization

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de62

University of Kaiserslautern

Xputer Lab History of Loop Transformations

• For Sequential Programs on Parallel Computers: David Loveman, 1977, Allen and Kennedy, etc.:

Loop Unrolling, Loop Fusion, Strip Mining ....

• For memory communication: Michael Herz (2000): Multi-Level Loop Unrolling to reduce Memory Cycles needed to create RA Data Streams

• For parallel Datapaths: Jürgen Becker (1997): to • Sequential to Super-Systolic Transformation • Optimize Throughput of Reconfigurable Arrays (RAs)

Instruction Code vs. Reconfiguration Code

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de63

University of Kaiserslautern

Xputer Lab Future Coarse Grain RA Development

• It is indispensable to operate within the Convergence Area of Compilers, Co-Compilers, Architecture and full-custom-style VLSI Design (array cells).

• It is a must, that Products come with a Development Platform which encourages users,especially also those with a limited Hardware Background.

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de64

University of Kaiserslautern

Xputer Lab>> Design Space Explorers

• EDA revolution

• Dead Supercomputer

• Stream-based Computing

• Stream-based Memory Architecture

• Design Space Explorers

• KressArray Xplorer

• Machine paradigms

• Co-Compilation

http://www.uni-kl.de

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de65

University of Kaiserslautern

Xputer Lab

Schedule

time slot

08.30 – 10.00

Reconfigurable Computing (RC)

10.00 – 10.30

coffee break

10.30 – 12.00

Stream-based Computing for RC

12.00 – 14.00

lunch break

14.00 – 15.30

Resources forRC

15.30 – 16.00

coffee break

16.00 – 17.30

FPGAs: recent developments

© 2001, reiner@hartenstein.de http://www.fpl.uni-kl.de66

University of Kaiserslautern

Xputer Lab

END

top related