cpre 583 reconfigurable computing lecture 10: wed 9/24/2010 (high-level acceleration approaches)

32
1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames) CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches) Instructor: Dr. Phillip Jones ([email protected]) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA http://class.ee.iastate.ed u/cpre583/

Upload: reilly

Post on 19-Mar-2016

19 views

Category:

Documents


1 download

DESCRIPTION

CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches). Instructor: Dr. Phillip Jones ([email protected]) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA. http://class.ee.iastate.edu/cpre583/. Announcements/Reminders. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

CPRE 583Reconfigurable ComputingLecture 10: Wed 9/24/2010

(High-level Acceleration Approaches)

Instructor: Dr. Phillip Jones([email protected])

Reconfigurable Computing LaboratoryIowa State University

Ames, Iowa, USA

http://class.ee.iastate.edu/cpre583/

Page 2: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

2 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• HW2: Due Wed 10/6– Problem 2 will have a separate deadline (to be announced)

• MP2: Due Fri 10/1 (you can work in pairs)– Make sure to read the README file in the MP2 distribution

• Contains info on how to fix a Gigabit core licensing issue ISE has

• Start thinking of class projects and forming teams– Submit teams and project ideas: Mon 10/11 midnight– Project proposal presentations: Wed 10/20

Announcements/Reminders

Page 3: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

3 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• Expectations– Working system– Write up that can potentially be submitted to a

conference• Will use DAC format as write up guide line

– 15-20minute PowerPoint Presentation• DAC (Design Automation Conference)

– http://www2.dac.com/– Conference papers

• Due Date: 5pm (MT) Thur 11/18/2010– Student Design Contest

• Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!

Projects

Page 4: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

4 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• FPL• FPT• FCCM• FPGA• DAC• ICCAD• Reconfig• RTSS• RTAS• ISCA

Projects Ideas: Relevant conferences• Micro• Super Computing• HPCA• IPDPS

Page 5: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

5 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Initial Project Proposal Slides (5-10 slides)• Project team list: Name, Responsibility (who is project leader)• Project idea

• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product

• High-level Plan– Break project into mile stones

• Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip.

– System block diagrams– High-level algorithms (if any)– Concerns

• Implementation• Conceptual

• Research papers related to you project idea

Page 6: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

6 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Weekly Project Updates

• The current state of your project write up– Even in the early stages of the project you

should be able to write a rough draft of the Introduction and Motivation section

• The current state of your Final Presentation– Your Initial Project proposal presentation

(Due Wed 10/20). Should make for a starting point for you Final presentation

• What things are work & not working• What roadblocks are you running into

Page 7: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

7 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• Teams Formed and Idea: Mon 10/11– Project idea in Power Point 3-5 slides

• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product

– Project team list: Name, Responsibility• High-level Plan/Proposal: Wed 10/20

– Power Point 5-10 slides• System block diagrams• High-level algorithms (if any)• Concerns

– Implementation– Conceptual

• Related research papers (if any)

Projects: Target Timeline

Page 8: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

8 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• Work on projects: 10/22 - 12/8– Weekly update reports

• More information on updates will be given• Presentations: Last Wed/Fri of class

– Present / Demo what is done at this point– 15-20 minutes (depends on number of projects)

• Final write up and Software/Hardware turned in: Day of final (TBD)

Projects: Target Timeline

Page 9: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

9 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Common Questions

Page 10: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

10 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• First 15 minutes of Google FPGA lecture• How to run Gprof• Discuss some high-level approaches for

accelerating applications.

Overview

Page 11: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

11 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

• Start to get a feel for approaches for accelerating applications.

What you should learn

Page 12: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

12 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Why use Customize Hardware?• Great talk about the benefits of Heterogeneous Computing

• http://video.google.com/videoplay?docid=-4969729965240981475#

Page 13: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

13 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Profiling Applications• Finding bottlenecks

• Profiling tools– gprof: http://www.cs.nyu.edu/~argyle/tutorial.html– Valgrind

Page 14: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

14 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining

4-LUTBCD

A

DFF4-LUT

DFF

4-LUT

DFF

4-LUT

DFF

output

4-LUTBCD

A

DFF4-LUT

DFF

4-LUT

DFF

4-LUT

DFF

1 DFF delay per output

How many ns to process to process 100 input vectors? Assuming each LUTHas a 1 ns delay.

Input vector<A,B,C,D>

How many ns to process 100 input vectors? Assume a 1 ns clock

Page 15: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

15 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 16: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

16 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 17: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

17 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1

1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 18: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

18 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1

1 2

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 19: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

19 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1 3

1 2 3

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 20: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

20 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1 3 6

1 2 3

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

Page 21: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

21 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1 3 6

1 2 3

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

How many ns to process if CPU can process one cell per clock (1 ns clock)?

Page 22: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

22 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1 3 6

1 2 3

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

How many ns to process if FPGA can obtain maximum parallelism each clock?(1 ns clock)

Page 23: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

23 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Pipelining (Systolic Arrays)

1 3 6

1 2 3

1 1 1

Dynamic Programming

1. Start with base caseLower left corner

2. Formula for computing numbering cells3. Final result in upper right corner.

What speed up would an FPGA obtain (assuming maximum parallelism) for an 100x100 matrix. (Hint find a formula for an NxN matrix)

Page 24: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

24 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Dr. James Moscola (Example)

MATL2

D10ML9

MATP1

IL7 IR8

END3

E12

IL11

ROOT0

MP3 D6MR5ML4

S0

IL1 IR2

ROOT0

MATP1

MATL2

END3

1

2

3

c ga

cg a

1 2 3

Page 25: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

25 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Example RNA Model

MATL2

D10ML9

MATP1

IL7 IR8

END3

E12

IL11

ROOT0

MP3 D6MR5ML4

S0

IL1 IR2

ROOT0

MATP1

MATL2

END3

1

2

3

c ga

cg a

1 2 3

Page 26: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

26 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Baseline Architecture Pipeline

D10

ML9

IL11

IR8

IL7

D6

MR5

ML4

MP3

IR2

IL1

S0

E12

residuepipeline

ROOT0MATP1MATL2END3

u g cg g a c a c c c

Page 27: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

27 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Processing Elements

ML4

j

d

+

+

+

+

=

=

=

+

IL7,3,2

IR8,3,2

ML9,3,2

D10,3,2ML4_t(10)

ML4_t(9)

ML4_t(8)

ML4_t(7)

ML4_e(A)

ML4_e(C)

ML4_e(G)

ML4_e(U)

input residue, xi

ML4,3,3 =.22

0 1 2 3

0

1

2

3

.40-INF

.22.72.30-INF

.30.44-INF

-INF

Page 28: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

28 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Baseline Results for Example Model• Comparison to Infernal software

– Infernal run on Intel Xeon 2.8GHz– Baseline architecture run on Xilinx Virtex-II 4000

• occupied 88% of logic resources• run at 100 MHz

– Input database of 100 Million residues

• Bulk of time spent on I/O (41.434s)

Page 29: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

29 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Expected Speedup on Larger Models

ModelName

Num PEs

Pipeline Width

Pipeline Depth

Latency (ns)

HW Processing

Time (seconds)

Total Timewith

measured I/O (seconds)

Infernal Time

(seconds)

Infernal Time (QDB) (seconds)

Expected Speedup

over Infernal

Expected Speedup

over Infernal (w/QDB)

RF00001 3539545 39492 195 19500 1.0000195 42.4340195 349492 128443 8236 3027

RF00016 5484002 43256 282 28200 1.0000282 42.4340282 336000 188521 7918 4443

RF00034 3181038 38772 187 18700 1.0000187 42.4340187 314836 87520 7419 2062

RF00041 4243415 44509 206 20600 1.0000206 42.4340206 388156 118692 9147 2797

Example 81 26 6 600 1.0000006 42.4340006 1039 868 25 20

• Speedup estimated ...– using 100 MHz clock– for processing database of 100 Million residues

• Speedups range from 500x to over 13,000x– larger models with more parallelism exhibit greater

speedups

Page 30: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

30 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Distributed Memory

Cache

ALU

BRAM

BRAM

BRAM

BRAM

PE

Page 31: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

31 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Next Class

• Models of Computation (Design Patterns)

Page 32: CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches)

32 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)

Questions/Comments/Concerns

• Write down– Main point of lecture

– One thing that’s still not quite clear

– If everything is clear, then give an example of how to apply something from lecture

OR