07 late binding
TRANSCRIPT
-
8/8/2019 07 Late Binding
1/18
Introduction Design Evaluation Conclusion
Writing Portable Applications that
Dynamically Bind at Run Time to
Reconfigurable Hardware
Michael Schilli
Institut fr Betriebs- und Dialogsysteme, Lehrstuhl SystemarchitekturUniversitt Karlsruhe (TH)
December 12, 2007
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
2/18
Introduction Design Evaluation Conclusion
Multicomputer Architecture
Multicomputer Architecture
Combine FPGAs with traditional processors
Range from reconfigurable supercomputers to embedded
systems
Can be tailored to specific applications
Like image processing
FPGAs offer potential hardware acceleration
Add more levels of parallelism
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
3/18
Introduction Design Evaluation Conclusion
Multicomputer Architecture
Example
Cray XD1
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
4/18
Introduction Design Evaluation Conclusion
Goals
Main Goals
Speed up compute-intensive applications
Provide means to access hardware implementationsFocus on coarse-grained libraries
E.g. libc too fine-grained (not much to accelerate)Overhead to program hardware and transfer data too highSignal and image processing are good candidates
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
5/18
Introduction Design Evaluation Conclusion
VSIPL++
Vector Signal Image Processing Library (VSIPL)
USER PROGRAM
VSIPL
PPC
VSIPL++
LIBCPPC PPC
SAL PPCPERF
Library of commonly used signal and image processingalgorithms (C++ API)
Used to develop high-performance applicationsApplications are portable from one architecture to another
Library can be optimized on a particular platformOptimized implementations
SAL, Mercury Systems (PowerPC)PPCPERF Libraries (PowerPC 4xx)VSIPL, NASoftware (G4 and MIPS)
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
C
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
6/18
Introduction Design Evaluation Conclusion
VSIPL++
Problems
Reconfigurable multicomputers pose 2 main challengesApplication and hardware specific code intertwined
Code not easily reusable for other applications
Difficult to substitute code for different hardwareComplicate code portability
Application coder required to have expertise in bothhardware and application domain
Goals
Separate hardware specific parts from applicationBind to hardware/software implementation at run-time
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
I t d ti D i E l ti C l i
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
7/18
Introduction Design Evaluation Conclusion
Vforce
Vforce (VSIPL++ For Reconfigurable Computing)
USER PROGRAM
VSIPL
PPC
VSIPL++
LIBCPPC PPC
SAL PPCPERF
Vforce VSIPL++
SPP
SPECIFIC
IMPL.
Vforce extends VSIPL++
Hides HW implementationbeneath standard APISupports reconfigurable HWSupports binding to HW platform at run-timeProvides software and hardware implementationsSeparates application code from HW specific code
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
8/18
Introduction Design Evaluation Conclusion
Vforce
Vforce Framework
Application code
Hardware (IPI)
RTRM
Vforce Processing Object (API)
Generic HW Object Vforce SW
Function
Software
VSIPL++
FunctionDLSO
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
9/18
Introduction Design Evaluation Conclusion
Vforce
Vforce Processing Object
Application code
RTRM
Vforce Processing Object
Generic HW Object
DLSO
Encapsulates a specific algorithm
Similar to VSIPL++ butContains two implementations per algorithmSoftware-only and generic hardware implementationUsed implementation determined at run time
Interfaces with generic hardware objects via IPI
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
10/18
Introduction Design Evaluation Conclusion
Vforce
Generic Hardware Object
Application code
RTRM
Vforce Processing Object
Generic HW Object
DLSO
Does not contain any hardware specific code
Contains only code to interact with the RTRM
Loads the appropriate DLSO on kernel_init
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
11/18
Introduction Design Evaluation Conclusion
Vforce
Run Time Resource Manager (RTRM)
Application code
RTRM
Vforce Processing Object
Generic HW Object
DLSOSeparate process outside application
Access to processing kernel library and DLSO library
Provides DLSOs to generic hardware object
Only involved during hardware request and initialization
Allows requests from different processing objects
Shares limited reconfigurable resources
Potential reuse of already configured hardware(depends on kernel library)
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
12/18
Introduction Design Evaluation Conclusion
Vforce
Dynamically Loaded Shared Objects (DLSO)
Application code
RTRM
Vforce Processing Object
Generic HW Object
DLSO
Implement IPI calls to reconfigurable hardware library
Bound to a given hardwares API (vendor specific)
Each object implements a specific algorithm
Gets loaded via the generic hardware object
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
13/18
g
Vforce
Run Time Example
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
14/18
g
FFT Benchmark
Experiment 1
Fast Fourier Transformation
FFT object created for the Cray XD1Generated with Xilinx Coregen
Supports sizes from 64 to 32768 points
Data is transferred via DMA
Michael Schilli University of Karlsruhe (TH)
Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
15/18
FFT Benchmark
Results
Size Native API Vforce HW % Change
64 1.778 1.778 0
256 1.770 1.783 0.73512 1.778 1.778 0
1024 1.777 1.798 1.18
8192 1.774 1.772 -0.11
32768 1.774 1.777 0.17
Michael Schilli University of Karlsruhe (TH)Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
16/18
Introduction Design Evaluation Conclusion
-
8/8/2019 07 Late Binding
17/18
Beamforming Benchmark
Results
Exp. Software Vforce (Single FPGA) Speedup
4 2296.88s 616.59s 3.73
8 4488.48s 678.17s 6.6212 8883.07s 812.37s 10.93
16 17726.60s 1136.78s 15.59
20 35636.10s 2009.42s 17.73
25 343042.90s 1383.29s 247.99
Source: Albert Conti, A Hardware/Software System for Adaptive Beamforming, Northeastern University, May 2007
Michael Schilli University of Karlsruhe (TH)Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
Introduction Design Evaluation Conclusion
http://goforward/http://find/http://goback/ -
8/8/2019 07 Late Binding
18/18
Conclusion
Vforce
Extends standard VSIPL++ library
Allows late binding at run-time
Supports application portability
Disadvantages
Overhead by adding extra layers and objects (minimal)Relies on pre-built libraries for hardware implementations
Michael Schilli University of Karlsruhe (TH)Writing Portable Applications that Dynamically Bind at Run Time to Reconfigurable Hardware
http://goforward/http://find/http://goback/