reiner hartenstein (invited paper, invited book chapter): memorial … · 2015. 11. 3. · reiner...

31
[email protected] Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 1 Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de Reconfigurable Computing and the von Neumann Syndrome Reiner Hartenstein TU Delft, Sept 28, 2007 © 2007, [email protected] http://hartenstein.de TU Kaiserslautern 2 (Preface:) it’s old stuff ! • Most of the enabling technologies of Reconfigurable Computing have been published in the 70ies and 80ies: being also the keys to cope with the von Neumann syndrome* • This is mainly ignored by the CS community by the tunnel view of a reductionist mind set. • We need to think out of the box: R&D and education need a twin paradigm approach *) this term has been coined by C. V. Ramamoorthy © 2007, [email protected] http://hartenstein.de TU Kaiserslautern 3 Stoffsammlung • vN critics Backus - Arvind • Dataflow critics Dan Gajski • Microprogramming Manchester • HDL (Computer Mag) • 2 Gehirnhälften • Overhead-based vN • Manycore programmiong crisis © 2007, [email protected] http://hartenstein.de TU Kaiserslautern 4 ### • #### © 2007, [email protected] http://hartenstein.de TU Kaiserslautern 5 ### • #### © 2007, [email protected] http://hartenstein.de TU Kaiserslautern 6 ### • ####

Upload: others

Post on 27-Aug-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 1

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

Reconfigurable Computing and the von Neumann Syndrome

Reiner Hartenstein

TU Delft, Sept 28, 2007

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

2

(Preface:) it’s old stuff !

• Most of the enabling technologies of Reconfigurable Computing have been published in the 70ies and 80ies: being also the keys to cope with the von Neumann syndrome*

• This is mainly ignored by the CS community by the tunnel view of a reductionist mind set.

• We need to think out of the box: R&D and education need a twin paradigm approach

*) this term has been coined by C. V. Ramamoorthy

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

3

Stoffsammlung

• vN critics Backus - Arvind • Dataflow critics Dan Gajski • Microprogramming Manchester • HDL (Computer Mag) • 2 Gehirnhälften • Overhead-based vN • Manycore programmiong crisis

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

4

###

• ####

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

5

###

• ####

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

6

###

• ####

Page 2: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 2

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

7

###

• ####

non-von-Neumann accelerators

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

8

### • Reconfigurable Computing (RC) is found everywhere. see one-click-only per area on page:

http://xputers.informatik.uni-kl.de/RCeducation07/pervasiveness.html

• Lots of papers report speedups by up to 4 orders of magnitude by Software to configware migrations (onto FPGAs)

• Configware industry is the growing counterpart to the software industry. Microsoft is heavily working on coming up with a configware operating system, probably part of later releases of Windows.

• The basic machine paradigm under configware is not instruction-stream-driven. For this reason it is going to turn entire traditional CS mind set topside-down

• Together with the many-core crisis, the disruptive RC mind set is heavily shaking the foundations of Computer Science. See, for instance, (HPCwire:) Confronting Parallelism: The View from Berkeley: http://www.hpcwire.com/hpc/1288079.html

• The Landscape of Parallel Computing Research: A View From Berkeley: http://view.eecs.berkeley.edu/wiki/Main_Page

• Reconfigurable Computing (RC) is found everywhere. see one-click-only per area on page: http://xputers.informatik.uni-kl.de/RCeducation07/pervasiveness.html

• Mainstream in embedded systems since years ago, FPGAs with 7 bio US-$ are the fastest growing section of the microchip market.

• There are masses of books on using FPGAs http://www.fpl.uni-kl.de/FPGAbooks/

• - - A trailblazing book will appear toward the end of 2007 with Springer Verlag: Christphe Bobda: "Introduction to Reconfigurable Computing Systems - Architectures, Algorithms and Applications". For Reconfigurable Computing it should play a similar role as known for VLSI design from the goundbreaking historical book (1979, 1980): "Introduction to VLSI Systems"; by Carver Mead and Lynn Conway

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

9

### • http://www.fpl.uni-kl.de/FPGAbooks/ • More than 170 international conference series cover Reconfigurable

Computing: tp://hartenstein.de/NewJournal.pdf • - - A trailblazing book will appear toward the end of 2007 with Springer

Verlag: Christphe Bobda: "Introduction to Reconfigurable Computing Systems - Architectures, Algorithms and Applications“ For Reconfigurable Computing it should play a similar role as known for VLSI design from the goundbreaking historical book (1979, 1980) "Introduction to VLSI Systems"; by Carver Mead and Lynn Conway

• Reconfigurable Computing is found everywhere. see one-click-only per area on page: http://xputers.informatik.uni-kl.de/RCeducation07/pervasiveness.html

• (HPCwire:) Confronting Parallelism: The View from Berkeley: http://www.hpcwire.com/hpc/1288079.html

• The Landscape of Parallel Computing Research: A View From Berkeley: http://view.eecs.berkeley.edu/wiki/Main_Page

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

10

Moore disaster

• Moore said nothing about improved gigaFLOPS per $ orwatt or square inch increasing passive power has stalled the entire industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

11

Carpaltunnel syndrome

• vN syndrom is also a tunnel syndrome: tunnel view syndrome - into a tunnel of horror

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

12

Feedback loop • Wiener 1961 • Implemented in the ancient world • Re-discovered for steam engines • Digitalized and made programable Zuse / von Neumann

instruction stream

“CPU”

Sequencer

(controller) DPU

evoke

decision

data

Page 3: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 3

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

13

Feedback loop • ##

“rDPA”

rDPU

reconfiguration

code rDPU

reconfiguration

code

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

14

####

• we are embarking on a new computing age -- the age of massive parallelism.

• Will everyone have multiple parallel computers at their disposal every day?

• Smith: Yes. Even mobile devices will exploit multicore processors, not only for better performance but also to extend battery life by replacing the relatively power-hungry serial processors used today.

• are there prospects for global address space (GAS) languages? • an increase in the population of HPC-competent people, according to

Smith. He anticipates that the mainstream will adopt desktops as their own "personal supercomputers," while smart phones will be used as PDAs, MP3 players, and so on. The reinvention of the computing profession is a job not just for universities, but for companies such as Microsoft, which must make the developer community familiar with the new computing philosophy, Smith contends.

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

15

Finally, after 60 years, we are witnessing the collapse of the spirit from the Mainframe Age – triggered by the run-away break-through of Reconfigurable Computing."

Arthur Schopenhauer

• Arthur Schopenhauer: "Approximately every 30 years, we declare the scientific, literary and artistic spirit of the age bankrupt. In time, the accumulation of errors collapses under the absurdity of its own weight."

• Reiner H.: "Mesmerized by the Gordon Moore Curve, we in computer science slowed down our own learning curve. Finally, after 60 years, we are witnessing the collapse of the spirit from the Mainframe Age –

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

16

Outline

• The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • The Anti Machine • We need a twin paradigm approach • Conclusions

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

17

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

18

Tools etc.

• Wiki nachsehen

Page 4: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 4

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

19

Configware: more compute power than by Software

80% of all (micro)processors are embedded

average acceleration factor >5 ->

25% o‘ embedded µProc. are accelerated by FPGA(s)

(very cautious estimation)

-> Every 5th µProc is accelerated by FPGA(s)

Conclusion: most compute power

comes from Configware

very pessimistic estimation

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

20

FPGAs as accelerators found everywhere

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

21

Pervasiveness of RC

mirror: http://www.fpl.uni-kl.de/ RCeducation08/pervasiveness.html

http://hartenstein.de/pervasiveness.html

one click only per keyword on this list: shows number of hits by google

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

22

Outline

• The Pervasiveness of FPGAs

• The Reconfigurable Computing Paradox

• The Gordon Moore gap

• The von Neumann syndrome

• The Anti Machine

• We need a twin paradigm approach

• Conclusions simple FPAGs

coarse-grained arrays

saving energy

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

23

Software-to-Configware (FPGA) Migration:

molecular dynamics simulation 88

some published speed-up factors [2003 – 2005]

100

103

106

real-time face detection 6000

video-rate stereo vision

900 pattern recognition 730

SPIHT wavelet-based image compression 457

FFT 100

Reed-Solomon Decoding 2400

Viterbi Decoding 400

1000

MAC

DSP and wireless

Image processing, Pattern matching,

Multimedia

BLAST 52

protein identification 40

Smith-Waterman pattern matching

288

Bioinformatics GRAPE

20

Astrophysics

speed

up f

acto

r

crypto 1000

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

24

Software-to-Configware (FPGA) Migration:

molecular dynamics simulation 88

some published speed-up factors [2003 – 2005]

100

103

106

real-time face detection 6000

video-rate stereo vision

900 pattern recognition 730

SPIHT wavelet-based image compression 457

FFT 100

Reed-Solomon Decoding 2400

Viterbi Decoding 400

1000

MAC

DSP and wireless

Image processing, Pattern matching,

Multimedia

BLAST 52

protein identification 40

Smith-Waterman pattern matching

288

Bioinformatics

GRAPE 20 Astrophysics

speed

up f

acto

r

crypto 1000

The RC

paradox

deficiency

factor: >10,000

speed-up

factor: 6,000

total discrepancy: >60,000,000

3000

Page 5: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 5

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

25

“simple” FPGAs are only the beginning

• Less discrepancy for platform FPGAs and coarse-grained reconfigurable arrays

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

26

Hollerith • Prototyped 1884 by Hollerith

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

27

The first Reconfigurable Computer

•Prototyped 1884 by Herman Hollerith

•A century before FPGA introduction

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

28

Hollerith • #

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

29

Hollerith • #

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

30

####

• ‘Bilder suchen: chickens + 2 Oxen

Page 6: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 6

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

31

Executive Summary doesn‘t help

2 strong Reconfigurable Computing oxen

manycore critics decades ago?

vs. 1024 von Neumann chickens ?

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

32

Outline

• The Pervasiveness of FPGAs

• The Reconfigurable Computing Paradox

• The Gordon Moore gap

• The von Neumann syndrome

• The Anti Machine

• We need a twin paradigm approach

• Conclusionsin & the multicore crisis

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

33

Moore’s law not applicable to all aspects of VLSI

What is the reason of the paradox ?

The Gordon Moore curve does not indicate performance

The peak clock frequency does not indicate performance

the law of Gates

astronomic code size causes

massive overhead, due to

von Neumann syndrome

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

34

Rapid Decline of Computational Density

[BWRC, UC Berkeley, 2004]

1990 1995 2000 2005

200

100

0

50

150

75

25

125

175

SP

EC

fp2000/M

Hz/B

illio

n T

ransis

tors

HP

alph

a: d

own

by

100

in

6 y

rs

IBM

: dow

n b

y 2

0 in

6 y

rs

stolen from Bob Colwell

memory wall, caches, ...

primary design goal: avoiding a paradigm shift

dramatic demo of the von Neumann Syndrome

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

35

Outline

• The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • The Anti Machine • We need a twin paradigm approach • Conclusions

the overhead-prone paradigm

refusing the paradigm shift

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

36

„It is feared that domain scientists will have to learn how to design hardware. Can we avoid the need for hardware design skills and understanding?“

Avoiding the paradigm shift?

Tarek El-Ghazawi, panelist at SuperComputing 2006

„A leap too far for the existing HPC community“ panelist Allan J. Cantle

SuperComputing, Nov 11-17, 2006, Tampa, Florida, over 7000 registered attendees, and 274 exhibitors

We need a bridge strategy by developing advanced tools for training the software community to think in fine grained parallelism and pipelining techniques.

Page 7: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 7

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

37

The von Neumann Syndrome

The data-stream-based anti machine approach:

The instruction-stream-based von Neumann approach:

has no von Neumann bottle-necks

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

per CPU!

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

38

The Law of more

• 1000 processors running in parallel means 1000 instruction streams with all their overhead phenomena

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

39

Have to re-think basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

40

Refusing the paradigm shift leads to …

• an array of overhead phenomena • waste of researcher capacity on “speculative”

methods - the newest: “transactional memory” • multithreading* is not the silver bullet • highly disappointing computational density • the multicore programming crisis • massive programmer productivity decline • massive software engineering problems

*) is indeterministic

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

41

blind on one eye …

• Most “computer scientists” have mainly ignored the RC break-through

• Curriculum recommendations miss to hit most of the IT job market

• instruction-stream-based only: blind on the other eye ….

• … reductionist tunnel view …

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

42

Outline

• The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • The Anti Machine • We need a twin paradigm approach • Conclusions

instruction-stream vs. data stream

history of systolic arrays

bridging the chasm: an old hat

Page 8: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 8

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

43

Von Neumann CPU

DPU program counter

DPU CPU

term program counter

execution triggered by paradigm

CPU

yes instruction

fetch

instruction-stream-based

RAM memory - World of Software -Engineering

Program Source: Software

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

44

Data-stream-based

• in contrast to von Neumann, which is instruction-stream-based, the anti machine is data-stream-based (no instruction fetch at run time)

• Sequencing by one or multiple data counters (each located with an ASM*)

• The history of data streams …….

*) ASM = auto-sequencing memory block

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

45

Here is the common model

program counter

DPU CPU

RAM memory

von Neumann bottleneck

von Neumann instruction-stream-

based machine

co-processors

accelerator CPU

instruction-stream-based

data-stream-

based

har

dw

are

software

mainframe age:

microprocessor age:

configware age:

CPU accelerator reconfigurable

software/configware co-compiler

software configware accelerator reconfigurable

accelerator hardwired

CPU

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

46

Overhead avoided by anti machine

# feature von Neumann

machine

hardwired

anti machine

reconfigurable

anti machine

11 state address computation

overhead at run time

instruction stream none

12 data address computation

overhead at run time

instruction stream none

13 Inter PU communication

overhead at run time

instruction stream none

14 instruction fetch at run time instruction stream none

15 data meet PU at run time instruction stream none

16 synchonization overhead instruction stream none

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

47

Data meeting the Processing Unit (PU)

by Software

by Configware

routing the data by memory-cycle-hungry instruction streams thru shared memory

placement of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... partly explaining the RC paradox

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

48

Outline

• The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • The Anti Machine • We need a twin paradigm approach • Conclusions

Page 9: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 9

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

49

Dual paradigm: an old hat

Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer

Software mind set: instruction-stream-based: flow chart -> control instructions (FSM: state transition)

-> Register Transfer Modules (DEC: mid 1970ies); similar concept: Case Western Reserve Univ. ;

FF

token bit

evoke

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

50

Dual paradigm: an old hat (2)

“procedure call” or function call

call Module-name (parameters); Software: time domain

Hardware Description Languages;

Hardware description: space domain

An old hat: we just need to accept it

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

51

We need a twin paradigm approach

• We need a duality of 2 cultures: • a kind of transdisciplinary approach • 1) the instruction-stream-based mind set • = computing in time (procedural semantics) • and 2) the data-stream-based mind set • = computing in space (structural semantics)

We do not need a paradigm shift We must adopt the second paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

52

Why the two paradigms are twins

• Both paradigms have the same syntax rules • Their sequencers use the same circuity • Their semantics is only slightly different • But there is an external asymmetry: • The location of the counter (with the CPU

or with memory) • The number of counters: single (program

counter), multiple (data counters)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

53

Similarity of Programming Language Paradigms

language category instruction stream Languages data stream Languages

both deterministic procedural sequencing: traceable, checkpointable

operation sequence driven by:

read next instruction, goto (instr. addr.),

jump (to instr. addr.), instr. loop, loop nesting

no parallel loops, escapes, instruction stream branching

read next data item, goto (data addr.),

jump (to data addr.), data loop, loop nesting, parallel loops, escapes, data stream branching

state register program counter data counter(s)

address computation

massive memory cycle overhead overhead avoided

Instruction fetch memory cycle overhead overhead avoided

parallel memory bank access interleaving only no restrictions

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

54

Outline

• The Pervasiveness of FPGAs • The Reconfigurable Computing Paradox • The Gordon Moore gap • The von Neumann syndrome • The Anti Machine • We need a twin paradigm approach • Conclusions

Page 10: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 10

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

55

Have to re-think basic assumptions

Instead of physical limits, fundamental misconceptions of algorithmic complexity theory limit the progress and will necessitate new breakthroughs.

Not processing is costly, but moving data and messages

We’ve to re-think basic assumptions behind computing

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

56

Conclusions

• De facto performance of von Neumann computing systems is dramatically behind the expectations from the Gordon Moore curve

• Massive von Neumann parallelism causes a progressive decline of programmer productivity

• Trouble stems from a refused paradigm shift • Reconfigurable Computing provides improvement

by orders of magnitude • We need a twin paradigm education • Upgrading CS curriculum recommendations is overdue

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

57

RCeducation 2008

http://www.fpl.uni-kl.de/RCeducation08/

The 3rd International Workshop on Reconfigurable Computing Education

April 10, 2008, Montpellier, France

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

58

The Configware Age

• Mainframe age and

microprocessor(-only) age are history

• We are living in the

configware age right now!

• Attempts to avoid the paradigm

shift will again create a disaster

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

59

FPGA experts needed

• Inserat kopieren: FPGA expert saught

• Akute Mangelware

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

60

thank you for your patience

Page 11: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 11

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

61

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

Procedural personalization via RAM-based

Machine Paradigm

Personalization (CAD) before fabrication

structural personalization:

RAM-based before run time

Software Industry’s Secret of Success

Repeat Success Story by new Machine Paradigm !

Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

62 © 2001, [email protected]

University of Kaiserslautern

Xputer Lab

instructions

program cou n ter:

state register

Compiler RAM

Datapath

har dw ired

Sequencer

Computer tightly coupled

by compact instruction code

“von Neumann” does not support

soft data paths

Datapath

Xputer

Scheduler

Compiler

RAM

(multiple) sequencer

Datapath Array

“instructions”

University of Kaiserslautern

Xputer Lab

loosely coupled by decision data bits only

Xputer: The Soft

Machine

Paradigm reconfigurable

also for hardwired

Computer: the wrong Machine Paradigm

“von Neumann”

s

d a ta cou n ter

(anti machine)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

63

Reconfigurable semiconductor market

Xilinx 42%

Altera 37%

Lattice 15%

Actel 6%

Top 4 PLD Manufacturers 2000

total: $3.7 Bio

• [Dataquest] > $7 billion by 2003.

• PLD vendors’ and their alliances provide libraries of “soft IPs”

Configware Market

• fastest growing semiconductor market segment

coarse-grained:

rDPUs: configurable functional blocks

fine-grained:

cLBs, rLBs: configurable logic blocks

PACT AG, Munich, Germany http://pactcorp.com

Quicksilver, San Jose http://quicksilver-tech.com

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

64

Semiconductor Revolutions

“Mainstream Silicon Application is switching every 10 Years”

TTL

custom

standard

1957

1967

1977 LSI, MSI

µproc., memory

1987

1997 ASICs, accel’s

1st

des

ign

cris

is

2nd

des

ign

cris

is

hardware people new breed (M&C)

software people new breed needed

2007

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

65

Semiconductor Revolutions

“Mainstream Silicon Application is switching every 10 Years”

TTL µproc., memory

custom

standard

1957

1967

1977

1987

1997

2007

ASICs, accel’s

LSI, MSI

“The Programmable System-on-a-Chip is the next wave“

Tredennick’s Paradigm Shifts

hardwired

algorithm: fixed

resources: fixed

procedural programming

algorithm: variable

resources: fixed

structural programming

algorithm: variable

resources: variable

vN machine paradigm

anti machine paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

66

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

Procedural personalization via RAM-based

Machine Paradigm

Software Industry’s Secret of Success

Page 12: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 12

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

67

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

structural personalization:

RAM-based before run time

Repeat Success Story by new Machine Paradigm !

Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

68

Impact of Data-stream-based ...

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

structural personalization:

hardwired before fabrication

Repeat Success Story by new Machine Paradigm !

Embedded Hardware/ Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

69

The History of Paradigm Shifts

“Mainstream Silicon Application is switching every 10 Years”

TTL µproc., memory

“The Programmable System-on-a-Chip is the next wave“

custom

standard

1957

1967

1977

1987

1997

2007

ASICs, accel’s

LSI, MSI

1st

Design

Crisis

2nd

Design

Crisis

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

70

The Impact of Makimoto’s Paradigm Shifts

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

Procedural personalization via RAM-based

Machine Paradigm

Personalization (CAD) before fabrication

structural personalization:

RAM-based before run time

Dr. Makimoto: FPL 2000 keynote

Software Industry’s Secret of Success

Repeat Success Story by new Machine Paradigm !

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

71

Makimoto’s 3rd wave

The next EDA Industry Revolution

1978

Transistor entry: Applicon, Calma, CV ...

1992

Synthesis (HDLs): Cadence, Synopsys ... 1985

Schematics entry: Daisy, Mentor, Valid ...

[Keutzer / Newton]

McKinsey Curves

EDA industry paradigm switching every 7 years

1999 (Co-) Compilation:

data-stream-based DPAs

[Hartenstein]

Von Neumann does not support Morphware:

“The Programmable System-on-a-Chip

is the next wave“

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

72

The anti universe

•Paul Dirac predicted a complete anti universe consisting of antimatter

•“There are regions in the universe, which consist of antimatter .....

•We are not aware, that there is a new area in computing sciences , which consists of antimatter of computing

• .... But there are asymmetries”

•Reconfigurable Computing is made from this antimatter: data-stream-based computing

•when a particle hits its antiparticle, both are converted into energy: Annihilation

Page 13: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 13

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

73

anti particles

• 1956: anti neutron created on Bevatron

• 1928: Paul Dirac: „there should be an anti electron having positive charge“ (Nobel price 1933)

• 1932: Carl David Anderson detected this „positron“ in cosmic radiation (Nobel price 1936)

• 1955 Owen Chamberlain et al. create anti proton on Bevatron

• 1954: new accelerators: cyclotron, like Berkeley‘s Bevatron

• 1965: creation of a deuterium anti nucleus at CERN

hydrogen anti hydrogen

• 1995: hydrogen anti atom created at CERN – by forcing positron and anti proton to merge by very low energy.

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

74

Matter & Antimatter: Atom and Anti Atom

The World of Matter -

machine paradigm: the Atom

Anti Matter -

machine paradigm: Anti Atom

+ + -

- - +

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

75

Matter & Antimatter of Informatics : Machine and Anti Machine

+

CPU

- 1936 1st electronic computer (Konrad Zuse)

Machine paradigm: „von Neumann“

1946 v. N. machine paradigm

1971 1st microprocessor (Ted Hoff)

1979 „data streams“ (systolic array: Kung / Leiserson)

- DPU

+

Anti Machine paradigm

1990 anti machine paradigm published

1995 rDPA / DPSS (supersystolic: Rainer Kress)

novel

compilation

techniques

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

76

- DPU

Data

Path

Unit

DPU

+

CPU

Data

Path

instruction sequencer instruction

stream

Matter vs. antimatter: CPU vs. DPU

- +

dat

a st

ream

dat

a st

ream

s +

+

Data

Path

Unit

DPU

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

77

heavy anti atoms: DPA = DPU array

- DPA

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU -

DPA

+

+

+

+

+

+

+

+

+

coher

ent

dat

a st

ream

s sp

inni

ng a

roun

d

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

78

Parallelism by Concurrency

+ -

+

- -

+

- +

+

-

- +

- +

independent instruction streams difficult ...

Page 14: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 14

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

79

>> Anti Machine and its Resources

• Microelectronics History

• fine grain and coarse grain Morphware

• Anti Matter of Computing

• Anti Machine and its Resources

• Problems to be solved http://www.uni-kl.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

80

Dichotomy of machine paradigms

DPU instruction sequencer

CPU

M instruction stream

M

(r) DPU

asM

data stream

M M M M

M M M M asM address

generator

(r)DPU Array

(r)DPA

(r)DPU or

data streams

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

81

Terminology: DPU versus CPU ...

• DPU: data path unit • DPA: DPU array • GA: gate array • rDPU: reconfigurable DPU • rDPA: reconfigurable DPA • rGA: reconfigurable GA

• DPU is no CPU: there is nothing central - like in a DPA

DPU DPU

DPU instruction sequencer

CPU

DPA r

r

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

82

What is the trend ?

• vN is needed for embedded systems, OS, compilers, Sauerkraut software, non-performance-critical applications, others ….

• vN is obsolete for massive parallelism, except some special application areas

• Anti machine is the way to go for massive parallelism, also data-intensive applications

• Morphware is the way for high performance with short product life cycles, unstable standards

•Data-stream-based Computing is heading for mainstream

–1979 „data streams“ (Kung / Leiserson)

–1997 SCCC (LANL) Streams-C Configurabble Computing

–SCORE (UCB) Stream Computations Organized for Reconfigurable Execution

–ASPRC (UCB) Adapting Software Pipelining for Reconfigurable Computing

–2000 Bee (UCB), ...

–Most stream-based multimedia systems, etc.

–Many other areas ....

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

83

Conclusion: all knowledge needed is available

•machine paradigm

•anti architectural resources

•sequencing methodology: hw & sw

•parallel memory IP core and module generator vendors

courses / embedded tutorials: • DATE. Munich, 2001

• ASP-DAC, Yokohama, 2001 • SBCCI, Brasilia, 2001

full day courses:

Univ. Montpellier 1998 Nokia / Univ. Tampere, Finland, 2002

CNRS Paris France, 2002 UnB, Brasilia, 2002

• 10 keynotes 2001 / 2002

• 5 invited talks 2001 / 2002

•anything else needed

•compilation techniques

•hw / sw partitioning methodology

• languages

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

84

Main problems to be solved

computing in space

computing in time

systolic arrays etc.

and other transformations migration by re-timing

this dichotomy is completely ignored by our CS curricula

•Each programmer should have qualified awareness on dichotomy and morphware

•curricular innovations are urgently needed

•Lack of qualified users and implementers

Page 15: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 15

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

85

CS education .....

software person

procedural

structural

hardware person

Configware / Software Co-Design? Hardware / Software Co-Design?

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

86

Annihilation?

- +

-

+ -

+

avoidable by careful

methodology

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

87

However, current CS Education ….

Hardware invisible: under the surface

… is based on the Submarine Model

Brain usage: procedural-only

Algorithm

Assembly Language

procedural high level Programming Language

Hardware

This model disables ...

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

88

Hardware, Configware

Hardware and Software as Alternatives

Algorithm

Software

partitioning

Software only

Software & Hardw/Configw

procedural structural

Brain Usage: both Hemispheres

Hardw/Configw only

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

89

The Dominance of the Submarine Model ...

Hardware

... indicates, that our CS education system produces zillions of

mentally disabled Persons

(procedural) structurally disabled

… completely disabled to cope with solutions other than software only

It‘s time to attack the software faculty dictatorship. Get involved!

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

90

Antimatter Search ?

Antimatter Search

in EE & CS we do not need to search

Page 16: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 16

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

91

Digital System Platforms clearly distinguished (2)

platform program source running on it

machine paradigm

hardware (not programmable)

none

morphware

fine grain rGA (FPGA) configware

coarse grain

rDPU, rDPA reconfigurable data stream processor

flowware & configware anti

machine data stream processor (hardwired) flowware

instruction stream processor software von Neumann machine

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

92

Matter & Antimatter

The World of Matter machine paradigm: the Atom

+ + -

The World of Anti Matter machine paradigm: Anti Atom

- - +

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

93

Matter & Antimatter of Informatics :

- DPU

+

Anti Machine paradigm

+

CPU

-

nothing central !

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

94

heavy anti atoms: DPA = DPU array

- DPA

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU

- DPU -

DPA

+

+

+

+

+

+

+

+

+

flow

ware

: dat

a st

ream

s sp

inni

ng a

roun

d

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

95

What are the Challenges ? (1) [ST microelectronics, MorphICs, Dataquest, eASIC]

1

2

0 10 12 18 months

factor

4y

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

96

What are the Challenges ? (3) [ST microelectronics, MorphICs, Dataquest, eASIC]

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

30y

10y

4y

3y avoid application-

specific silicon !

Page 17: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 17

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

97

What are the Challenges ? (4) [ST microelectronics, MorphICs, Dataquest, eASIC]

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

30y

Battery capacity (1.03/year)

10y

4y

3y

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

98

What are the Challenges ? (5) [ST microelectronics, MorphICs, Dataquest, eASIC]

1

2

0 10 12 18 months

factor

*) Department of Trade and Industry, London

30y Battery capacity (1.03/year)

10y

4y

3y

5y

2y new

compilation techniques

needed ! supported

by a new machine

paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

99

Machine Paradigms

machine category Computer (the Machine:

“v. Neumann”) The Anti Machine

driven by: Instruction streams data streams (no “dataflow”)

engine principles instruction sequencing sequencing data streams

state register single program counter (multiple) data counter(s)

Communication path set-up .

at run time at load time

resource DPU (e.g. single ALU) DPU or DPA (DPU array) etc. data path

operation sequential parallel pipe network etc.

( “instruction fetch” )

also hardwired implementations* *) e g. Bee project Prof. Broderson

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

100

Throughput vs. Efficiency

1000

100

10

1

0.1

0.01

0.001 2 1 0.5 0.25 0.13 0.1 0,07

MOPS / mW

µ feature size

S S

S S

resources needed for

reconfigurability

L

L L

L L

L

L L L

area used by application

~1 Bit CLB

T. Claasen et al.: ISSCC 1999

Wiring by abutment: 32 Bit example

*) R. Hartenstein: ISIS 1997

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

101

Throughput vs. Flexibilityy

1000

100

10

1

0.1

0.01

0.001 2 1 0.5 0.25 0.13 0.1 0,07

MOPS / mW

µ feature size

T. Claasen et al.: ISSCC 1999

Wiring by abutment: 32 Bit example

*) R. Hartenstein: ISIS 1997

flexibility

throughput

hard- wired

von Neumann

FPGAs

coarse grain goes far beyond bridging the gap

coarse grain

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

102

PACT XPP: Reference Module: XPU128 Co-Processor

ALU - PAE

CF

G

PAE

core

ALU CtrlALU

CF

GC

FG

PAE

core

CF

GC

FG

PAE

core

PAE

core

ALU CtrlALUALU CtrlALU

CF

GC

FG

CF

GC

FG

XPP128 ALU-Array

• 2 X PACs (Cluster)

• 128 X ALU-PAEs

• 32 X 1Kbyte RAM-PAEs

• 8X I/O Elements

• Full 32 or 24 Bit Design

• 2 Configuration Hierarchies

• Evaluation Board (2001)

• XDS Development Tool with Simulator

• PAE Core is 32- or 24-Bit ALU with DSP-Instruction Set and Controller

• Connecttions: Inputs + Outputs (Channels) + Events

[Jürgen Becker,

Univ. Karlsruhe]

Page 18: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 18

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

103

The Dominance of the Submarine Model ...

Hardware

... indicates, that our CS education system produces zillions of

mentally disabled Persons

(procedural) structurally disabled

… completely disabled to cope with solutions other than software only

It‘s time to attack the software faculty dictatorship. Get involved!

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

104

However, current CS Education ….

Hardware invisible: under the surface

… is based on the Submarine Model

Brain usage: procedural-only

Algorithm

Assembly Language

procedural high level Programming Language

Hardware

This model disables ...

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

105

Hardware, Configware

Hardware and Software as Alternatives

Algorithm

Software

partitioning

Software only

Software & Hardw/Configw

procedural structural

Brain Usage: both Hemispheres

Hardw/Configw only

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

106

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

Procedural personalization via RAM-based

Machine Paradigm

Personalization (CAD) before fabrication

structural personalization:

RAM-based before run time

Software Industry’s Secret of Success

Repeat Success Story by new Machine Paradigm !

Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

107

scalability

The Scalability Problem

The Routing congestion Problem grows with the size of the FPGA

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

108

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

Structured Configware Design

rout thru only

not used backbus connect

SNN filter KressArray Mapping Example

(Mead & Conway Revival)

Page 19: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 19

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

109

Nasty Matter

+

CPU

Data

Path

instruction sequencer

RAM

Address Computation Overhead

Instruction Fetch Overhead

central von Neumann bottleneck

extremely power hungry and area inefficient

reconfigurable?

the wrong machine paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

110

- DPU

Data

Path

Unit

DPU

Data

Path

instruction sequencer

Matter vs. Antimatter: CPU vs. DPU

+

dat

a st

ream

dat

a st

ream

s

+

+

Data

Path

Unit

DPU

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

111

+

CPU

Data

Path

instruction sequencer

+ simple machine paradigm + scalability

+ relocatability + compatibility

= secret of success of software industry

RAM

RAM-based CPU:

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

112

Parallelism by Concurrency

independent instruction streams

....

Bus(es) or switch box

Data

Path

instruction sequencer

Data

Path

instruction sequencer

Data

Path

instruction sequencer

Data

Path

instruction sequencer

+ -

+

-

- +

+

+

-

+

- +

-

-

difficult coordination

massive run time overhead

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

113

Semiconductor Revolutions

“Mainstream Silicon Application is switching every 10 Years”

TTL µproc., memory

custom

standard

1957

1967

1977

1987

1997

2007

ASICs, accel’s

LSI, MSI

“The Programmable System-on-a-Chip is the next wave“

Tredennick’s Paradigm Shifts

hardwired

algorithm: fixed

resources: fixed

procedural programming

algorithm: variable

resources: fixed

structural programming

algorithm: variable

resources: variable

vN machine paradigm

anti machine paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

114

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

Procedural personalization via RAM-based

Machine Paradigm

Software Industry’s Secret of Success

Page 20: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 20

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

115

Impact of Makimoto’s wave

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

structural personalization:

RAM-based before run time

Repeat Success Story by new Machine Paradigm !

Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

116

Impact of Data-stream-based ...

TTL µproc., memory

custom

standard

ASICs, accel’s

LSI, MSI

1957

1967

1977

1987

1997

2007

structural personalization:

hardwired before fabrication

Repeat Success Story by new Machine Paradigm !

Embedded Hardware/ Configware Industry

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

117

Rapidly growing CS education gap

•Our computing curricula are obsolete

• introduction is strictly „procedural-only“

•vN-only use of terms like „computer organisation“, „ computer structures“, „ computer architecture

•graduates are not prepared to the real world

– most applications for embedded systems (>90% by 2010)

•our graduates are unable to compete with EE graduates

•only a few % curricula need to be changed

•my mission: getting you involved

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

118

http://kressarray.de

Efficient Memory Communication should be directly supported by the Mapper Tools

sequencers

memory ports

application

not used

Legend: Optimized Parallel Memory Controller

An example by Nageldinger’s KressArray Xplorer

Synthesizable Memory Communication

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

119

Data-Stream-based Soft Machine

Scheduler Memory

(data memory)

memory bank

memory bank

memory bank

memory bank

memory bank

...

...

“instructions”

rDPA Compiler

Sequencers (data stream

generator)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

120

############### Terminology has been

highly confusing

1

2

0 10 12 18

mon

ths

factor

*) Department of Trade and Industry, London

30y

Battery capacity (1.03/year)

10y

4y

24 36 48

Page 21: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 21

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

121

The RC Rush is already running (RC = Reconfigurable Computing)

µproc., memory

TTL

standard

ASICs, accel’s

custom

LSI, MSI

reconfigurable platforms

1987

1957

1967

1977 1997

2007

“Mainstream Silicon Application is switching every 10 Years”

Makimoto’s Wave

1s

t d

es

ign

cri

sis

2n

d d

es

ign

cri

sis

the RC rush

rapidly growing no. of courses

http://FPL.org 216 submissions

(DAC’02 had less than 500)

professors took courses

M&C rush roots of the

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

122

Semiconductor Revolutions

“Mainstream Silicon Application is switching every 10 Years”

TTL µproc., memory

custom

standard

1957

1967

1977

1987

1997

2007

ASICs, accel’s

LSI, MSI

“The Programmable System-on-a-Chip is the next wave“

Tredennick’s Paradigm Shifts

hardwired

algorithm: fixed

resources: fixed

procedural programming

algorithm: variable

resources: fixed

structural programming

algorithm: variable

resources: variable

vN machine paradigm

anti machine paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

123

No vN bottleneck

The anti machine has no von

Neumann bottleneck.

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

124

3 different mind sets

TTL µproc., memory

1957

1967

1977

1987

1997

2007

ASICs, accel’s

LSI, MSI

FPGAs

coarse grain

soft CPUs

hardware people CS people new breed needed

Common terminology needed

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

125

What‘s the problem ?

.... by signals rippling through a network of transistors.

The typical programmer has problems to understand function evaluation without machine mechanisms....

Traditional CS: programming is (control-)procedural, instruction-stream-based – sources: software

accelerators µprocessor

It‘s the gap between procedural and structural mind set

Crossing the Hardware / Software Chasm [Mike Butts]

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

126

What‘s the problem ?

accelerators µprocessor

The brain hurts on paradigm shift ?

no, it can‘t ...

Brain usage: procedural-only

structural hemisphere missing

Crossing the Hardware / Software Chasm [Mike Butts]

Page 22: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 22

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

127

Reconfigurable Computing: a second programming domain

Migration of programming to the structural domain

The opportunity to introduce the structural domain to programmers ...

The structural domain has become RAM-based

... to bridge the gap by clever abstraction mechanisms using a simple new machine paradigm

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

128

control-procedural vs. data-procedural

The structural domain is primarily data-stream-based:

..... mostly not yet modelled that way: most flowware is hidden by its indirect

instruction-stream-based implementation

Flowware provides a (data-)procedural abstraction from the (data-stream-based) structural domain

Flowware converts „procedural vs. structural“ into „control-procedural vs. data-procedural“ ...

... a Troyan horse to introduce the structural domain to the procedural mind set of programmers

Flowware*

*) explained later

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

129

How to achieve acceptance

No hardware description languages

Courses tailored for students not being hardware-savvy

Tools usable by users not being hardware designers

EDA tools based on term rewriting [Arvind] [Mauricio Ayala]

[Courtesy Richard Newton]

Your name here: your proposals

how to hide the ugliness from the user [Herman Schmit]

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

130

>> Why it’s time for a New CS

http://www.uni-kl.de

• Preface

• Terminology clean-up

• Why it’s time for a New CS

• Draft of a Roadmap

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

131

McKinsey Curve: dynamics of R&D disciplines

maturity of a discipline

year

fundmental issues

consolidation

saturation: limitations met

new discipline on top of it by ....

... by innovation

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

132

data streams ...

History of Computing

mainframes PC

?

1957

1967

1977

1987

1997

2007

new CS

maturity

classical CS

morphware

.... but awareness still missing ... still ignored by most CS curricula

it´s already existing ...

here?

Page 23: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 23

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

133

EDA Industry Revolutions

1978

Transistor entry: Applicon, Calma, CV ...

1992

Synthesis: Cadence, Synopsys ... 1985

Schematics entry: Daisy, Mentor, Valid ...

courtesy [Keutzer / Newton]

EDA industry paradigm switching every 7 years

1999 HLLs, (Co-) Compilation

Data-Stream-based DPU arrays

2006 coming closer to programmers‘ mind set

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

134

it‘s time for a new CS

it‘s time for a new CS ...

configware trend flowware trend

embedded systems: hw/cw/sw co-design

CS crisis: qualification

problems urging us

next EDA wave: high level languages

opportunities

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

135

all ingredients available

algorithmic cleverness: new directions for „algorithms and data structures“ specialists

morphware scalability / configware relocatability: achievable by EDA support

all ingredients available: published the past 30 years

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

136

On-chip memory

algorithmic cleverness: new directions for „algorithms and data structures“ specialists

RC: on chip distributed memory architecture

vN: code size of astronomic dimensions -> off-chip memory

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

137

http://kressarray.de

Efficient Memory Communication should be directly supported by the Mapper Tools

sequencers

memory ports

application

not used

Legend: Optimized Parallel Memory Controller

Synthesizable Distributed Memory

An example by Nageldinger’s KressArray Xplorer

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

138

Software Industry

Procedural personalization via RAM-based

Machine Paradigm

Software Industry’s Secret of Success

µprocessor, memory ICs

1957

1967

1977

1987

1997

2007

go mainstream

Page 24: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 24

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

139

Configware Industry ?

structural personalization:

RAM-based before run time

Repeat Success Story by new Machine Paradigm !

Configware Industry

µprocessor memory ICs morphware

1957

1967

1977

1987

1997

2007

goes mainstream

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

140

KressArray Family generic Fabrics: a few examples

Examples of 2nd Level Interconnect: layouted over rDPU cell - no separate routing areas !

+

rout-through and function

rout-through

only more NNports:

rich Rout Resources

Select Function

Repertory

select Nearest Neighbour (NN) Interconnect: an example

16 32 8 24

4

2 rDPU

Select mode, number, width of NNports

http://kressarray.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

141

stolen from Bob Colwell

processor/memory commmunication bottleneck

vN bottleneck vN: unbalanced

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

142

>>> we need ... <<<<<

We need a Mead-&-Conway-like text book

We need undergraduate lab courses on HW / CW / SW partitioning We need new courses with extended scope on parallelism and algorithmic cleverness for HW / CW / SW migration / partitioning What else do we need ? Your proposals ?

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

143

>>> we need support <<<<<

We need the support of the open-minded

members of the classical CS community

Let us assemble a list with e-mail addresses

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

144

>>> thank you <<<<<

thank you for your patience

Page 25: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 25

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

145

>>> book <<<<<

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

146

>>> END <<<

END

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

147

Having introduced Data streams

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

input data stream

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| output data streams

time

port #

time

time

port # time

port #

systolic array research: throughout the 80ies:

Mathematicians‘ hobby

The road map to HPC:

ignored for decades ~1980

DPA (pipe network)

execution transport-triggered

no memory wall

H. T. Kung

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

148

Who generates the Data Streams?

Mathematicians: it‘s not our job

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

|

(it‘s not algebraic)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

149

Without a sequencer …

… it’s not a machine reductionist approach:

(it‘s not our job)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

150

of course algebraic (linear projection)

only for applications with regular data dependencies

Mathematicians caught by their own paradigm trap

Rainer Kress discarded their algebraic synthesis methods and replaced it by simulated annealing: rDPA

1995

Synthesis Method?

Page 26: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 26

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

151

The counterpart of the von Neumann machine

x x x

x x x

x x x

|

| |

x x

x

x

x

x

x x

x

- -

-

x x

x

x

x

x

x x

x

- -

-

-

-

-

-

-

-

-

-

-

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

|

(r)DPA

ASM

ASM

ASM

ASM

ASM

ASM

AS

M

AS

M

AS

M

AS

M

AS

M

AS

M

data counter

GAG RAM

ASM: Auto-Sequencing

Memory

data counters instead of a program counter

data counters: located at memory (not at data path)

coarse-grained

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

152

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not used backbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPU

note: software perspective without instruction streams: pipelining

compiled by Nageldinger‘s KressArray Xplorer with Juergen Becker‘s CoDe-X inside

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

153

Simple KressArray Configuration Example

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

154

C or FORTRAN ?

Computer scientists haven’t been interested in programming clusters. If putting the cluster on a chip is what excites them, fine.

Gordon Bell:

It will still have to run Fortran!

*) like CoDe-X

Support tools have been demonstrated by academia

Classical programming languages, but with a slightly different semantics (data-procedural) are good candidates for parallel programming.

Reiner Hartenstein (conclusion of this talk):

or C (X-C)

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

155

thank you for your patience

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

156

END

Page 27: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 27

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

157

Start-ups for Coarse-grained Platforms

• One company has failed

• Several companies have succeeded, but their technology disappeared through acquisition.

• Two companies are still available

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

158

Multi Core: Just more CPUs ?

Complexity and clock frequency of single-core microprocessors come to an end

Without a paradigm shift just more CPUs on chip lead to the dead roads known from supercomputing

Multi-core microprocessor chips emerging: soon 32 cores on an AMD chip, and 80 on an intel

Multi-threading is not the silver bullet

We’ve to re-think basic assumptions behind computing

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

159

New Languages for Parallelism ?

Efforts to extend standards-based, serial programming languages with features to describe parallel constructs are likely to fail.

Nick Tredennick:

Term Rewriting Systems may raise the abstraction level up to math formulae

Mauricio Ayala-Rincón:

What’s more likely to succeed are languages that raise the level of abstraction in algorithm description

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

160

C or FORTRAN ?

Computer scientists haven’t been interested in programming clusters. If putting the cluster on a chip is what excites them, fine.

Gordon Bell:

It will still have to run Fortran!

Loop transformations from C or Fortran by automatically partitioning software/configware co-compilers* targetting coarse-grained reconfigurable arrays are quite promising.

Reiner Hartenstein:

*) like CoDe-X

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

161

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

route-thru-only rDPU

3 vert. NNports, 32 bit

http://kressarray.de

Xplorer Plot: SNN Filter Example

+ [13]

2 hor. NNports, 32 bit

operator

result

operand

operand

route thru

backbus connect

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

162

data counter

GAG RAM

ASM

data counter

GAG RAM

ASM

data counter

GAG RAM

ASM

Configware Compilation

configware code

flowware code

mapper

configware compiler

scheduler

source „program“

Configware Engineering

placement & routing

data

programming the data counters

configware compilation fundamentally different from software compilation

x x x

x x x

x x x

|

| |

x x x

x

x x

x x

x

- -

-

x x x x

x x

x x x

- - -

- - -

- - -

- - -

x x x

x x x

x x x

|

|

|

|

|

|

|

|

|

|

|

| data streams

rDPA

pipe network

data counter

GAG RAM

ASM: Auto-Sequencing Memories ASM

Page 28: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 28

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

163

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 = 160 rDPUs

rout thru only

not used backbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, e. g. 32 bits wide

no CPU

rDPU

note: software perspective without instruction streams

Symptom of the von Neumann Syndrome

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

164

Hybrid Multi Core example

twin paradigm machine

each core can run CPU mode

or rDPU mode

rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU rDPU

rDPU rDPU rDPU

CPU

CPU CPU

CPU

CPU CPU

CPU CPU

64 cores

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

165

History

• 1987-1990: Xputer – anti machine

• 1990/1 GAG address generator & ASM

• 1994: Compilers for Xputers

• 1995: Kress Array ASP-DAC

• 1995: Compilation f. corse-grained arrays

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

166

Loop Transformation Examples

loop 1-8 body body endloop

loop 1-8 body endloop

loop 9-16 body endloop

fork

join

strip mining

loop 1-4 trigger endloop

loop 1-2 trigger endloop

loop 1-8 trigger endloop

reconf.array: host: loop 1-16 body endloop

sequential processes: resource parameter driven Co-Compilation

loop unrolling

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

167

reconfigurability overhead>

routing congestion

wiring overhead

overhead:

>> 10 000

1980 1990 2000 2010 100

103

106

109

FPGA logical

FPGA routed

density:

FPGA physical

transistors

/ microchip

immense area

inefficiency

1st DeHon‘s Law [1996: Ph. D thesis, MIT]

general purpose “simple” FPGA

Deficiencies of reconfigurable fabrics (FPGA) (fine-grained)

power guzzler

slow clock

deficiency

factor: >10,000

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

168

Software-to-Configware (FPGA) Migration:

molecular dynamics simulation 88

some published speed-up factors [2003 – 2005]

100

103

106

real-time face detection 6000

video-rate stereo vision

900 pattern recognition 730

SPIHT wavelet-based image compression 457

FFT 100

Reed-Solomon Decoding 2400

Viterbi Decoding 400

1000

MAC

DSP and wireless

Image processing, Pattern matching,

Multimedia

BLAST 52

protein identification 40

Smith-Waterman pattern matching

288

Bioinformatics GRAPE

20

Astrophysics

speed

up f

acto

r

crypto 1000

Page 29: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 29

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

169

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

170

Software-to-Configware (FPGA) Migration:

molecular dynamics simulation 88

some published speed-up factors [2000– 2008]

100

103

106

real-time face detection 6000

video-rate stereo vision

900 pattern recognition 730

SPIHT wavelet-based image compression 457

FFT 100

Reed-Solomon Decoding 2400

Viterbi Decoding 400

1000

MAC

DSP and wireless

Image processing, Pattern matching,

Multimedia

BLAST 52

protein identification 40

Smith-Waterman pattern matching

288

Bioinformatics

GRAPE 20 Astrophysics

speed

up f

acto

r

crypto 1000

3000

34000

DES breaking

xputer

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

171

“simple” FPGAs are only the beginning

• Less discrepancy for platform FPGAs and coarse-grained reconfigurable arrays

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

172

carpal • #

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

173

Software-to-Configware (FPGA) Migration: Oil and gas [2005]

100

103

106

speed

up f

acto

r

oil and gas 17

side effect: slashing the electricity bill

by more than an order of magnitude

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

174

An accidentially discovered side effect

•Software to FPGA migration of an oil and gas application:

•only a speed-up factor of 17

•Electricity bill down to <10%

•Hardware cost down to <10%

•All other publications reporting speed-up did not report energy consumption.

Saves > $10,000 in electricity bills per year (7¢ / kWh) - .... per 64-processor 19" rack

Herb Riley, R. Associates

- This will change.

Page 30: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 30

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

175

What’s Really Going On

With Oil Prices? [BusinessWeek, Jan. 29, 2007]

$52 Price in Feb 2007 [NY Mercantile Exch.: Jan. 17]

$200 Minimum oil price in 2010, in a bet by investment banker Matthew Simmons

[BusinessWeek]

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

176

Energy as a strategic issue

•Google‘s ann. electricity bill: 50,000,000 $

•Amsterdam: 25% goes into server farms

•NY city server farms: 1/4 km2 floor area

[Mark P. Mills]

•Predicted for the USA in the year 2020: 30-50% of the entire national electricity consumption goes into cyber infrastructure

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

177

Energy: an im portant motivation

platform example Energy: W / Gflops energy factor

MDgrape-3* (domain-specific 2004)

0.2 1

Pentium 4 14 70

Earth Simulator (supercomputer 2003)

128 640

*) feasible also on reconfigurable platforms

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

178

Executive Summary doesn‘t help

We must first understand the nature of the paradigm

Understanding the Paradox ?

von Neumann chickens ?

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

179

Outline

• The Pervasiveness of FPGAs

• The Reconfigurable Computing Paradox

• The Gordon Moore gap

• The von Neumann syndrome

• The Anti Machine

• We need a twin paradigm approach

• Conclusionsin & the multicore crisis

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

180

Moore’s law not applicable to all aspects of VLSI

What is the reason of the paradox ?

The Gordon Moore curve does not indicate performance

The peak clock frequency does not indicate performance

the law of Gates

astronomic code size causes

massive overhead, due to

von Neumann syndrome

Page 31: Reiner Hartenstein (invited paper, invited book chapter): Memorial … · 2015. 11. 3. · Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis

[email protected]

Reiner Hartenstein (invited paper, invited book chapter): The von Neumann Syndrome; Stamatis Vassiliadis Memorial Symposium, Sep 28, 2007, Delft, Netherlands 31

Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

181

Rapid Decline of Computational Density

[BWRC, UC Berkeley, 2004]

1990 1995 2000 2005

200

100

0

50

150

75

25

125

175

SP

EC

fp2000/M

Hz/B

illio

n T

ransis

tors

HP

alph

a: d

own

by

100

in

6 y

rs

IBM

: dow

n b

y 2

0 in

6 y

rs

stolen from Bob Colwell

memory wall, caches, ...

primary design goal: avoiding a paradigm shift

dramatic demo of the von Neumann Syndrome

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

182

END

© 2007, [email protected] http://hartenstein.de

TU Kaiserslautern

183

END