computer generation of ip cores · 2012-07-06 · © markus püschel, 2011 © markus püschel...
TRANSCRIPT
![Page 1: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/1.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Peter Milder (ECE, Carnegie Mellon)
James Hoe (ECE, Carnegie Mellon)
Markus Püschel (CS, ETH Zürich)
Computer Generation of IP Cores
A In
addfxp #(16, 1) add15282(.a(a69), .b(a70), .clk(clk), .q(t45)); addfxp #(16, 1) add15297(.a(a71), .b(a72), .clk(clk), .q(t46)); subfxp #(16, 1) sub15311(.a(a69), .b(a70), .clk(clk), .q(t47)); subfxp #(16, 1) sub15325(.a(a71), .b(a72), .clk(clk), .q(t48));
TexPoint fonts used in EMF.AW|¹µ
Read the TexPoint manual before you delete this box.: AAAAAAAAAAA
![Page 2: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/2.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Software Hardware (FPGA, ASIC)
Libraries IP Cores
C/C++/Fortran Verilog/VHDL
Compilation Synthesis
Performance Area/Performance
![Page 3: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/3.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Software Hardware (FPGA, ASIC)
Libraries IP Cores
C/C++/Fortran Verilog/VHDL
Compilation Synthesis
Performance Area/Performance
Example: Discrete Fourier transform (DFT), size 64
6.12 Gigasamples/second
area
performance
best
![Page 4: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/4.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Software Hardware (FPGA, ASIC)
Libraries IP Cores
C/C++/Fortran Verilog/VHDL
Compilation Synthesis
Performance Area/Performance
Example: Discrete Fourier transform (DFT), size 64
area
performance
best Pareto-optimal designs 6.12 Gigasamples/second
![Page 5: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/5.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Problem: For a given function, how to efficiently obtain the Pareto-optimal designs?
Solution: IP core generator to enumerate design space
DSL to represent algorithms
Rewriting on DSL for architectural decisions
Compiler: DSL to RTL Verilog
![Page 6: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/6.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Current IP Core Generators (www.spiral.net) DFT, other transforms, sorting
![Page 7: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/7.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Current IP Core Generators (www.spiral.net) DFT, other transforms, sorting
![Page 8: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/8.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
DFT and FFT: Example n = 4
Fast Fourier transform (FFT):
Description with matrix algebra (SPL)
FFT data flow graph
DFT:
![Page 9: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/9.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Algorithms
Each one describes a data flow graph and thus could be directly mapped to hardware. But how to obtain a design space?
![Page 10: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/10.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Design Space: Example 8 Point FFT
All 8 inputs in parallel
Parallelism allows folding
![Page 11: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/11.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Design Space: Example 8 Point FFT
Two inputs in parallel, streamed over four cycles
Repeated stages allow folding
![Page 12: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/12.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Design Space: Example 8 Point FFT
Vertical and horizontal folding: Tradeoff between area cost and performance
How to do formally and systematically?
![Page 13: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/13.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Parallelism: structurally parallel
streaming: partial reuse
n
mn inputs in one cycle
A n
n A
n A n
. . .
. . .
n
streaming: full reuse
n inputs per cycle A n
…
m cycles
n A n
…
n A n
…
. . .
Im=ksr(IkAn)kn inputs per cycle
m/k cycles
streaming width w
![Page 14: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/14.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
iterative: partial reuse
iterative: full reuse
cascade
Repeated stages:
A n A n
m blocks
A n
1 block reused m times
A n A n
…
k blocks reused m/k times
…
depth d
![Page 15: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/15.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Previous Example
All 8 inputs in parallel
Parallelism allows for streaming reuse (sr)
![Page 16: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/16.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Vertical Folding By Rewriting
Repeated stages allow for iterative reuse (ir)
![Page 17: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/17.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Vertical & Horizontal Folding By Rewriting
How to build streaming permutations?
![Page 18: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/18.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
8 w
ord
s in
pa
ralle
l Parallel permutation Example: 8 points
Just wires
2 words per cycle 4 cycles
Streaming permutation Example: 8 points, 2 points per cycle
PP|{z}
stream(2)
?
![Page 19: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/19.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
2 words per cycle 4 cycles
Streaming permutation Example: 8 points, 2 points per cycle
Data stream must be buffered (using multiple banks of RAMs)
Need to route data carefully to prevent port conflicts
[J. of the ACM 2009; DATE 2009]
P|{z}stream(2)
![Page 20: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/20.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
IP Core Generator (Sketch)
Problem specification
Hardware directives
Algorithm Generation
Algorithm Rewriting
RTL Generation
Synthesizable Verilog
[TODAES 2012; DAC 2008]
![Page 21: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/21.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
FPGA: Area vs. Throughput
Pareto optimal
![Page 22: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/22.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
FPGA: Area vs. Throughput
Similar results for • other DFT sizes incl. non-two-power • real DFT • 2D DFT • DCT
![Page 23: Computer Generation of IP Cores · 2012-07-06 · © Markus Püschel, 2011 © Markus Püschel Computer Science Software Hardware (FPGA, ASIC) Libraries IP Cores C/C++/Fortran Verilog/VHDL](https://reader033.vdocuments.net/reader033/viewer/2022042012/5e72adca924ebc54d229b2ce/html5/thumbnails/23.jpg)
© Markus Püschel, 2011
© Markus Püschel Computer Science
Verilog
Not shown:
• ASIC results, power/perf tradeoff
• IP Generator for sorters
• Finding Pareto points without exhaustive design space enumeration
www.spiral.net