cpre 583 reconfigurable computing lecture 10: wed 9/24/2010 (high-level acceleration approaches)
DESCRIPTION
CPRE 583 Reconfigurable Computing Lecture 10: Wed 9/24/2010 (High-level Acceleration Approaches). Instructor: Dr. Phillip Jones ([email protected]) Reconfigurable Computing Laboratory Iowa State University Ames, Iowa, USA. http://class.ee.iastate.edu/cpre583/. Announcements/Reminders. - PowerPoint PPT PresentationTRANSCRIPT
1 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
CPRE 583Reconfigurable ComputingLecture 10: Wed 9/24/2010
(High-level Acceleration Approaches)
Instructor: Dr. Phillip Jones([email protected])
Reconfigurable Computing LaboratoryIowa State University
Ames, Iowa, USA
http://class.ee.iastate.edu/cpre583/
2 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• HW2: Due Wed 10/6– Problem 2 will have a separate deadline (to be announced)
• MP2: Due Fri 10/1 (you can work in pairs)– Make sure to read the README file in the MP2 distribution
• Contains info on how to fix a Gigabit core licensing issue ISE has
• Start thinking of class projects and forming teams– Submit teams and project ideas: Mon 10/11 midnight– Project proposal presentations: Wed 10/20
Announcements/Reminders
3 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• Expectations– Working system– Write up that can potentially be submitted to a
conference• Will use DAC format as write up guide line
– 15-20minute PowerPoint Presentation• DAC (Design Automation Conference)
– http://www2.dac.com/– Conference papers
• Due Date: 5pm (MT) Thur 11/18/2010– Student Design Contest
• Due Date: 5pm (MT) Wed 11/24/2010,Cash Prizes!
Projects
4 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• FPL• FPT• FCCM• FPGA• DAC• ICCAD• Reconfig• RTSS• RTAS• ISCA
Projects Ideas: Relevant conferences• Micro• Super Computing• HPCA• IPDPS
5 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Initial Project Proposal Slides (5-10 slides)• Project team list: Name, Responsibility (who is project leader)• Project idea
• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product
• High-level Plan– Break project into mile stones
• Provide initial schedule: I would initially schedule aggressively to have project complete by Thanksgiving. Issues will pop up to cause the schedule to slip.
– System block diagrams– High-level algorithms (if any)– Concerns
• Implementation• Conceptual
• Research papers related to you project idea
6 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Weekly Project Updates
• The current state of your project write up– Even in the early stages of the project you
should be able to write a rough draft of the Introduction and Motivation section
• The current state of your Final Presentation– Your Initial Project proposal presentation
(Due Wed 10/20). Should make for a starting point for you Final presentation
• What things are work & not working• What roadblocks are you running into
7 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• Teams Formed and Idea: Mon 10/11– Project idea in Power Point 3-5 slides
• Motivation (why is this interesting, useful)• What will be the end result• High-level picture of final product
– Project team list: Name, Responsibility• High-level Plan/Proposal: Wed 10/20
– Power Point 5-10 slides• System block diagrams• High-level algorithms (if any)• Concerns
– Implementation– Conceptual
• Related research papers (if any)
Projects: Target Timeline
8 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• Work on projects: 10/22 - 12/8– Weekly update reports
• More information on updates will be given• Presentations: Last Wed/Fri of class
– Present / Demo what is done at this point– 15-20 minutes (depends on number of projects)
• Final write up and Software/Hardware turned in: Day of final (TBD)
Projects: Target Timeline
9 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Common Questions
10 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• First 15 minutes of Google FPGA lecture• How to run Gprof• Discuss some high-level approaches for
accelerating applications.
Overview
11 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
• Start to get a feel for approaches for accelerating applications.
What you should learn
12 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Why use Customize Hardware?• Great talk about the benefits of Heterogeneous Computing
• http://video.google.com/videoplay?docid=-4969729965240981475#
13 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Profiling Applications• Finding bottlenecks
• Profiling tools– gprof: http://www.cs.nyu.edu/~argyle/tutorial.html– Valgrind
14 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining
4-LUTBCD
A
DFF4-LUT
DFF
4-LUT
DFF
4-LUT
DFF
output
4-LUTBCD
A
DFF4-LUT
DFF
4-LUT
DFF
4-LUT
DFF
1 DFF delay per output
How many ns to process to process 100 input vectors? Assuming each LUTHas a 1 ns delay.
Input vector<A,B,C,D>
How many ns to process 100 input vectors? Assume a 1 ns clock
15 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
16 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
17 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1
1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
18 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1
1 2
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
19 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1 3
1 2 3
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
20 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1 3 6
1 2 3
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
21 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1 3 6
1 2 3
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
How many ns to process if CPU can process one cell per clock (1 ns clock)?
22 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1 3 6
1 2 3
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
How many ns to process if FPGA can obtain maximum parallelism each clock?(1 ns clock)
23 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Pipelining (Systolic Arrays)
1 3 6
1 2 3
1 1 1
Dynamic Programming
1. Start with base caseLower left corner
2. Formula for computing numbering cells3. Final result in upper right corner.
What speed up would an FPGA obtain (assuming maximum parallelism) for an 100x100 matrix. (Hint find a formula for an NxN matrix)
24 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Dr. James Moscola (Example)
MATL2
D10ML9
MATP1
IL7 IR8
END3
E12
IL11
ROOT0
MP3 D6MR5ML4
S0
IL1 IR2
ROOT0
MATP1
MATL2
END3
1
2
3
c ga
cg a
1 2 3
25 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Example RNA Model
MATL2
D10ML9
MATP1
IL7 IR8
END3
E12
IL11
ROOT0
MP3 D6MR5ML4
S0
IL1 IR2
ROOT0
MATP1
MATL2
END3
1
2
3
c ga
cg a
1 2 3
26 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Baseline Architecture Pipeline
D10
ML9
IL11
IR8
IL7
D6
MR5
ML4
MP3
IR2
IL1
S0
E12
residuepipeline
ROOT0MATP1MATL2END3
u g cg g a c a c c c
27 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Processing Elements
ML4
j
d
+
+
+
+
=
=
=
+
IL7,3,2
IR8,3,2
ML9,3,2
D10,3,2ML4_t(10)
ML4_t(9)
ML4_t(8)
ML4_t(7)
ML4_e(A)
ML4_e(C)
ML4_e(G)
ML4_e(U)
input residue, xi
ML4,3,3 =.22
0 1 2 3
0
1
2
3
.40-INF
.22.72.30-INF
.30.44-INF
-INF
28 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Baseline Results for Example Model• Comparison to Infernal software
– Infernal run on Intel Xeon 2.8GHz– Baseline architecture run on Xilinx Virtex-II 4000
• occupied 88% of logic resources• run at 100 MHz
– Input database of 100 Million residues
• Bulk of time spent on I/O (41.434s)
29 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Expected Speedup on Larger Models
ModelName
Num PEs
Pipeline Width
Pipeline Depth
Latency (ns)
HW Processing
Time (seconds)
Total Timewith
measured I/O (seconds)
Infernal Time
(seconds)
Infernal Time (QDB) (seconds)
Expected Speedup
over Infernal
Expected Speedup
over Infernal (w/QDB)
RF00001 3539545 39492 195 19500 1.0000195 42.4340195 349492 128443 8236 3027
RF00016 5484002 43256 282 28200 1.0000282 42.4340282 336000 188521 7918 4443
RF00034 3181038 38772 187 18700 1.0000187 42.4340187 314836 87520 7419 2062
RF00041 4243415 44509 206 20600 1.0000206 42.4340206 388156 118692 9147 2797
Example 81 26 6 600 1.0000006 42.4340006 1039 868 25 20
• Speedup estimated ...– using 100 MHz clock– for processing database of 100 Million residues
• Speedups range from 500x to over 13,000x– larger models with more parallelism exhibit greater
speedups
30 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Distributed Memory
Cache
ALU
BRAM
BRAM
BRAM
BRAM
PE
31 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Next Class
• Models of Computation (Design Patterns)
32 - CPRE 583 (Reconfigurable Computing): High-level Acceleration Approaches Iowa State University (Ames)
Questions/Comments/Concerns
• Write down– Main point of lecture
– One thing that’s still not quite clear
– If everything is clear, then give an example of how to apply something from lecture
OR