ENG6530 Reconfigurable
Computing Systems
Paper ReviewPaper Review
SummarySummary
ENG6530 RCS 2
Topics Paper Review, Topics CoveredPaper Review, Topics Covered
20142014 Conclusion??Conclusion??
When to use RCS?When to use RCS? How to use RCS?How to use RCS? SummarySummary
ENG6530 RCS 3
References
“A Decade of Reconfigurable Computing: A Visionary Retrospective”, R. Hartenstein, 2001.
“Paper Review”, ENG6530/ENG3050 Web Site.
ENG6530 RCS 4
Paper Review1. AdaBoost-Based Real Time Object Detection
• By Ziad Abouwaimer
2. Accelerating LS-SVM with RTR • By Yin Li
3. Reconfigurable Computing using Content Addressable Memory• By Marie, Anderson, Raphael
4. Coarse Grain Reconfigurable Architectures • By Shane, Zack, Cristian
5. AES Implementation on FPGAs • By Daniel, Taras, Natalia
6. FPGA Bases String Matching for NP • By Justin, Albert
7. Instance Specific Accelerators for Min Cover • By Desmond, Matt
8. High Performance Pipelined FPGA GA • By Grayden, Ganga
ENG6530 RCS 5
Object detection is an important step in multiple applications. Real Time Detection Real Time Detection is critical in several domains. ParallelismParallelism can be easily exploited easily exploited from the application. The architecture proposed is flexible, scalableflexible, scalable. The architecture proposed can be extended can be extended to many other applications.
Flexible Parallel Hardware Architecture for AdaBoost-Based Flexible Parallel Hardware Architecture for AdaBoost-Based Real-Time Object DetectionReal-Time Object Detection
An Article Review by Ziad
ENG6530 RCS 6
Machine Learning is an important tool to solve many problems. SVM are one of the most successful techniques with high accuracy. LS-SVM is a modified version is a modified version of SVM (Quadratic Optimization problem) TrainingTraining Machine Learning algorithms is a bottleneckis a bottleneck Real Time performance Real Time performance is crucial for many applications such as: driver
assistant applications, Laser Guided Missiles, UAV, WSN ... Dynamic Run Time Reconfiguration is not necessary!!
Accelerating On-line Training of LS-SVM with Run Time ReconfigurationAccelerating On-line Training of LS-SVM with Run Time Reconfiguration
An Article Review by Yin Li
ENG6530 RCS 7
Coarse Grain FPGA were introduced as an alternative solution to fine grain FPGAs by providing multiple bit wide data paths and complex operators
Direct functional units functional units instead of LUT implementation Massive reduction Massive reduction of configuration time. Drastic complexity reduction Drastic complexity reduction of the P&R problem They are suitable for some applications some applications (communication protocols) They are not as flexible are not as flexible as fine/medium grain FPGAs
Coarse Grain FPGA ArchitecturesCoarse Grain FPGA Architectures
An Article Review by Shane, Zack and Cristian
ENG6530 RCS 8
This approach deviates from conventional FPGA architectures (LUTS) to avoid design overhead.
The proposed RC architecture is a Memory based methodology that uses CAM as the underlying reconfigurable fabric (speed)(speed)
The use of CAM leads to significant reduction in memory requirement compared to LUT based approach.
However, CAM suffers form high power consumption!!high power consumption!! Any commercial implementations?? Why?
Reconfigurable Computing using Content Addressable Memory for Reconfigurable Computing using Content Addressable Memory for Improved Performance and Resource UsageImproved Performance and Resource Usage
An Article Review by Marie, Anderson and Raphael
ENG6530 RCS 9
In the information age, cryptography has become one of the major methods for protection in all applications.
Cryptographic algorithms are used in embedded systemsembedded systems, desk tops, smart cards, wireless sensor networks, …
For superior and real time real time performance many applications require to realize cryptography algorithms in hardware.
Lots of parallelism can be exploited in the AES algorithm. Fine grain operations are performed Fine grain FPGAs
AES Implementation on FPGAsAES Implementation on FPGAs
An Article Review by Daniel, Taras and Natalia
ENG6530 RCS 10
String Matching is a very popular technique that is used in many many applicationsapplications such as packet classification, packet inspection, Bioinformatics, DNA sequences, search engines …..
Building hardware accelerators for string matching is important since current packet classification and inspection running on GP are slowslow.
Content Addressable MemoryContent Addressable Memory are fast but they are very limited very limited and consume lots of powerconsume lots of power.
FPGA Based String Matching for Network Processing ApplicationsFPGA Based String Matching for Network Processing Applications
An Article Review by Justin and Albert
ENG6530 RCS 11
Set covering is an important problem that can be used to solve many other applications crew scheduling, vertex covering, facility location, SAT
Speeding-up set covering is crucial (NP-hard problem) Significant raw speedup up to 5 orders of magnitude (small problems!!) Scalability might be an issue.
Instance-Specific Accelerators for Minimum CoveringInstance-Specific Accelerators for Minimum Covering
An Article Review by Desmond and Matt
ENG6530 RCS 12
Meta-heuristic that is population basedpopulation based rather than single point based. We can exploit parallelism easily from the application. In addition to parallelism, pipelining can also be applied (Population
Initialization, Selection, Reproduction, Evaluation, Replacement) Can be used in many applications many applications from Robot Path Planning, solving
classical optimization problems (set covering) to Protein folding.
A high-Performance, Pipelined FPGA Based Genetic Algorithm A high-Performance, Pipelined FPGA Based Genetic Algorithm MachineMachine
An Article Review by Ganga, Grayden
ENG6530 RCS 13
Lessons LearntLessons Learnt RCS is a trade-off between traditional hardware (performance)
and software (flexibility) Orders of magnitude performance improvements over Software
traditional systems. Pipelining and SIMD/MIMD type architectures can enhance the
performance of applications. Embedded Systems/Real Time requirements benefit from RCS. Algorithms might need to be modified to be mapped to hardware. Scalability is an issue that needs to be addressed. Input/output throughput might be an issue even when an
application can achieve orders of magnitude of speedup. The type of reconfigurable platform (fine/medium/coarse) plays an
important role in achieving a specific performance. Programming can be achieved at different levels of abstraction
(HDL vs. C/Matlab)
ENG6530 RCS 14
Steps to use RCS
ENG6530 RCS 15
ReconfigurableReconfigurable SystemSystem(( CustomCustom ComputingComputing MachinMachinee ))
Not all applications are suitable for Reconfigurable Computing. Applications that involve extensive recursionextensive recursion, for example, are a
poor match because the synthesized “hardware” must be of fixed size.
Applications that have only a small percentage of parallelismsmall percentage of parallelism (1-5%) will not make advantage of RCS.
Applications that are I/O boundI/O bound will also suffer due to memory I/O transfer
Applications that require floating pointrequire floating point arithmetic The first requirement in exploiting RC for HPC
applications is determining if your applicationdetermining if your application is well suited to acceleration.
ENG6530 RCS 16
ConsiderationsConsiderations PerformancePerformance:
Profiling to decide partitioning. Which bottlenecks will yield most performance Amdahl’s law Memory access and contention I/O bound versus CPU bound applications Synchronization between hardware/software
Consumption of hardware resourcesConsumption of hardware resources: Serial or fully parallel implementation Floating Point or Fixed point Using a single large FPGA (cost) Using several FPGAs (partitioning of application) Utilizing Run Time Reconfiguration Fine Grain, Medium Grain, or Course Grain?
Flexibility and EvaluationFlexibility and Evaluation: Hardware Descriptive Languages Electronic System Level and Design Exploration
ENG6530 RCS 17
Steps of Mapping Applications to RCSSteps of Mapping Applications to RCS1. Selecting the most appropriate algorithm
Complexity O(n2) or O(n log n), NP-Complete ..
2. Using the most appropriate language (C, C++, Matlab) Run the algorithm on a GPP (golden reference model)
3. Profiling (hot spots, bottlenecks, ..) Pure hardware or H/S co-design.?
ASIP? Soft Core (Microblaze), Hard Core (Power PC), ARM (Zync)
If pure hardware, what type of coupling? Using Amdahl’s Law (how much speedup?)
4. Compile Time Reconfiguration or Dynamic Time Reconfiguration?o Local or Global RTR?
5. Using the appropriate HDL language (VHDL, Verilog, System Generator, ESL)
6. Using the most appropriate platform: (Altera, Xilinx, ``Spartan 6, Virtex 5, 6, 7”..) Fine Grain, Medium Grain or Coarse Grain Platform?
ENG6530 RCS 18
1. Most Appropriate Algorithm?1. Most Appropriate Algorithm?
Selecting the most appropriate algorithm (complexity O(n2) or O(n log n), NP-Complete ..) An O(n2) algorithm might be more appropriate than an
O(n log n) since you might be able to exploit parallelism from the former.
It might be easier to map an O(n2) algorithm into hardware than a more efficient O(n) or O(n log n)
O(n log n) might be memory hungry and requires lots of access to memory.
O(n) algorithm might require lots of resources that are not available on the target FPGA
ENG6530 RCS 19
2. Most Appropriate Language?2. Most Appropriate Language? C,
appropriate for embedded application Fast Compact
C++, Reuse Portable Less used in embedded applications
Matlab, used extensively by hardware designers Easy to use Tool boxes available Speed of implementation Slow
ENG6530 RCS 20
3. Profiling and Bottlenecks3. Profiling and Bottlenecks
Using an appropriate profiler preferably associated with the CAD tool used.
Are the hotspots and bottlenecks found: easily translated to hardware.
Communication plays an important role. What type of coupling (functional unit, co-processor)? Type of processor used (Soft, Hard, External, ..) Metrics to be measured (importance?)
Area Power consumption Speed
ENG6530 RCS 21
4. Compile Time or Run Time Reconfiguration4. Compile Time or Run Time Reconfiguration
Compile time Reconfiguration: If designers seek an easy flexible methodology If the FPGA can accommodate his/her design If space is an issue then seek:
Serial or semi parallel versus fully parallel Use H/S co-design
Run Time Reconfiguration: Designers will expect to spend much more time to partition their
applications (benefits power, area) Temporal Partitioning (Static Partial Reconfiguration) Spatial Partitioning (Dynamic Partial Reconfiguration)
Some type of Operating System (Manager, Scheduling) is needed. Speed of reconfiguration should be fast (ICAP Design and other ..) Issues: Limited Support, No simulation nor verification is available
ENG6530 RCS 22
5. Design Entry5. Design Entry
Hardware Descriptive Languages (VHDL, Verilog): If designers seek a near optimal design in terms of performance and area
then an HDL is a must! Designers will expect to spend quite a bit of time implementing their
design. Other tools that can help reduce time:
System Generator Core Generator
Electronic System Level (HandelC, Vivado HLS): Allows designers to perform “Design Space Exploration” more easily than
HDLs. Exploiting parallelism via AutoESL or Handel-c would require verification
and adding pragma or directives. Some ESL allow for Hardware Software co-design Performance will suffer compared to HDL based implementation.
ENG6530 RCS 23
6. Platform6. Platform
Fine/Medium Grain FPGA (Altera, Xilinx) The most straight forward way of mapping an application is to use current
Xilinx and Altera (Fine/Medium Grain) FPGAs. Advantage (CAD tools are available)Advantage (CAD tools are available)
Verification is easy (Xilinx ISIM, Mentor Graphics Modelsim) Hardware in the loop Verification (System Generator)
Disadvantages:Disadvantages: Long times to compile and recompile Some degree of difficulty to use Dynamic Run Time Reconfiguration
Coarse Grain FPGA: If the application requires higher performance and low power
consumption then Coarse Grain FPGAs are the route to go. Disadvantages:Disadvantages:
Not too many platforms available CAD Tools are limited.
ENG6530 RCS 24
SummarySummary Not all applications Not all applications are suitable for Reconfigurable Computing. The first requirement in exploiting RC for HPC applications is
determining if the application is suited determining if the application is suited to acceleration. Reconfigurable computing can be well suited for several
applications such as Artificial Neural Networks, Genetic Algorithms, Cryptography, Image Processing, Simulation, Optimization, e.t.c., due to the parallelism that can be exploited in such applications.
Several steps have to be followed Several steps have to be followed carefully by the designer to ensure that they achieve their goals from implementing algorithms on Reconfigurable Computing (Tedious!Tedious!!).
There is a need for Design Exploration Design Exploration to shorten the amount of time taken by designers to target RCS.
25