optimal configuration of combined gpp/dsp/fpga …antonio/pubs/p-ann_rev99acs.pdf · optimal...
TRANSCRIPT
![Page 1: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/1.jpg)
Optimal Configuration ofCombined GPP/DSP/FPGA Systems for
Minimal SWAP
Presented byJohn K. Antonio
University of Oklahoma
Second Annual ReviewSeptember 23, 1999
![Page 2: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/2.jpg)
• Program Overview and Introduction (Quad Chart)
• Program Management Status
• Highlights from Year 1
• Highlights from Year 2
• Work to be Completed
Outline
![Page 3: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/3.jpg)
Configuring Combined GPP/DSP/FPGA Systems for Minimal SWAPApplications
• SAR• STAP
Requirements• Throughput• SWAP
•Combined Technology•Minimal SWAP Configuration•Mixed-Mode Operation•Demonstration
University of Oklahoma: John K. Antonio and Sudarshan K. Dhall
New Ideas• Systematic determination of minimal SWAP
configuration based on proven mathematical programming techniques
• Optimal configuration based on automatic“tuning” of system design parameters- number and types of cards used- data mapping and communication schemes- place and route schemes
• Novel computing techniques based oncharacteristics of GPP/DSP/FPGA system
Jun 97Start
Jun 98 Jun 99 Dec 00End
ScheduleDevelop optimalconfigurationtechniques
Construction and integration of GPP/DSP/FPGA system
Implement and test optimal configurations onGPP/DSP/FPGA system
Develop practicaldesign methodsbased on SAR andSTAP applications
Demonstrate advantagesof combiningtechnologies
Impact• Embedded Systems requirements for the
21st Century can be satisfied with thecombined use of GPP, DSP, and FPGA technologies
• Demonstrate use of FPGA boards as co-processors for embedded multiprocessorGPP and DSP systems
• Demonstrate systematic approaches tooptimally configure GPP/DSP/FPGA syst. forminimal SWAP for embedded applications
Jun 00
![Page 4: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/4.jpg)
• Program Overview and Introduction (Quad Chart)
• Program Management Status
• Highlights from Year 1
• Highlights from Year 2
• Work to be Completed
Outline
![Page 5: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/5.jpg)
Personnel(Program Management Status)
• John K. Antonio, Principal Investigator
• Ph.D., Texas A&M University
• Professor/Director of CS, University of Oklahoma
• Over 70 publications in HPC and related areas
• PI or co-PI of 17 contracts/grants
totaling over $2.1M
![Page 6: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/6.jpg)
Personnel(Program Management Status)
• Sudarshan K. Dhall, Co-Principal Investigator
• Ph.D., University of Illinois
• Professor of CS, University of Oklahoma
• Over 80 publications, 2 books, 3rd underway
• PI or co-PI of grants and contracting totalingabout $1M
![Page 7: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/7.jpg)
Personnel(Program Management Status)
• Jack West, Research Scholar
Optimal Mapping, Scheduling, and Configuration Techniques for STAP; Network Simulator; STAP Implementation
• Jeff Muehring, Research Scholar
Optimal GPP/DSP/FPGA Configuration Techniques for SAR; SAR Implementation Intern at IBM/Houston, 8/99 to 1/00
Research Scholar at OU, 1/00 to 7/00
![Page 8: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/8.jpg)
Personnel(Program Management Status)
• Hongping Li, Research Assistant, Ph.D. Student
Calibration of Power Prediction Simulator, System Interfacing, SAR Implementation
• Sirirut Vanichayobon, Research Assistant, Ph.D.Student
FPGA-Based Linear Equation Solver for STAP, System Interfacing, STAP Implementation
• Seok-Hyun Ko, Research Assistant, M.S. Student
Power Simulator Enhancements
![Page 9: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/9.jpg)
• Tim Osmulski, Research Assistant, M.S. student
Power Prediction Simulator for FPGAs
Graduated May 1998
• Nikhil Gupta, Research Assistant, M.S. student
Algorithms for STAP Weight Calculation Mapping Inner Product Computations onto FPGAs
Graduated August 1998
Personnel(Program Management Status)
![Page 10: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/10.jpg)
Personnel(Program Management Status)
• Brian Veale, Research Assistant, M.S. student
Space and Power Study for High-Performance Integer and Floating Point ReconfigurableArchitectures
Graduated August 1999
![Page 11: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/11.jpg)
Contacts, Partners, Vendors, and Other Communications
(Program Management Status)
José Muñoz, DARPA Ralph Kohler, Rome Lab
MIT Lincoln LabDavid MartinezJim Ward
MITRERichard Games
Northrop GrummanMarc Campbell
Synplicity, Inc. Madelyn Miller
XilinxJason Feinsmith
Annapolis Micro SystemsJenny DonaldsonBill HulbertPaul Kowalewski
ISIMilissa BenincasaDavid Coker
Mercury ComputerThomas EinsteinEd HolstienCraig LundDave Toms
![Page 12: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/12.jpg)
Mercury20 Slot Hybrid Chassis with SPARC 5VSolaris 2.5 with C CompilerMC/OS, Cross Assembler, ToolkitMPI-Pro for MC/OS9U VME RACE Board1 SHARC Daughtercard (2CNs, 8MB/CN, 3 SHARCs/CN) = 6 SHARCS3 SHARC Daughtercards (2CNs, 16MB/CN, 3 SHARCs/CN) = 18 SHARCS4 PowerPC Daughtercard (2CNs, 16MB/CN, 1 PPC/CN) = 8 PPCsRIN-T Input CardROUT-T Output Card
Annapolis Micro Systems4 PCI WILDONE Cards (Xilinx 4028/4036)4 PCI WILDFORCE Array Card (5 Xilinx 4085s)Interfacing Cables
Other VendorsModelSim Simulation Software (Model Technology, Inc.)Synplify Synthesis Software (Synplicity, Inc.)Xilinx Foundation Software (Xilinx, Inc.)
Equipment Status(Program Management Status)
![Page 13: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/13.jpg)
June 1997 Dec. 1998 June 2000 Dec. 2000Sept. 1999Mar. 1998
Design STAPIterative Weight Solver for FPGA
Inter-GPP/DSP Comm.Simulator for STAP
Optimal GPP/DSPConfig. for SAR
GPP/DSP/FPGA Platform Construction and Independent Testing of GPP/DSP and FPGA Subsystems
Implement STAP Iterative Weight Solver on FPGA
Optimal GPP/DSPConfig. for STAP
Implement SAR Linear Filteringon FPGA
Optimal GPP/DSP/FPGAConfig. for SAR/STAP
GPP/DSP and FPGA Subsystem Design, Integration and Testing
Optimal GPP/DSP/FPGA Config. for SAR
Demonstrate Combined SAR/STAP onGPP/DSP/FPGA Platform
Implement SAR on GPP/DSP
Design SAR Linear Filteringfor FPGA
Implement STAP on GPP/DSP
Implement SAR onGPP/DSP/FPGA Platform
Optimal GPP/DSP/FPGA Config. for STAP
Implement STAP onGPP/DSP/FPGA Platform
Develop FPGA Power Consumption Simulator
KeyGPP/DSP Sub-System
Research/DesignImplement/Test
FPGA Sub-SystemResearch/DesignImplement/Test
GPP/DSP/FPGA SystemResearch/DesignImplement/Test
Test FPGA Power Consumption Simulator
Schedule of Milestones(Program Management Status)
![Page 14: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/14.jpg)
CurrentBudget
Balance on8/1/99
ProjectedExpenses8/99-7/00
ProjectedExpenses8/00-12/00
Personnel 246,223 108,635 154,024 52,123
Fringes 72,117 36,051 27,712 9,340
Consulting 40,000 37,000 0 0
Expenses 9,781 6,261 10,000 5,069
Travel 17,545 4,889 12,000 7,372
Equipment 217,670 42,652 42,652 0
Indirect Cost 181,262 90,632 87,317 31,674
Total 784,598 326,120 333,705 105,578
Budget Summary(Program Management Status)
![Page 15: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/15.jpg)
• Program Overview and Introduction (Quad Chart)
• Program Management Status
• Highlights from Year 1
• Highlights from Year 2
• Work to be Completed
Outline
![Page 16: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/16.jpg)
Highlights from Year 1
• Optimal Configuration of Compute Nodes for SAR Processing
• Network Simulator
• FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
• FPGA Power Prediction Simulator
![Page 17: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/17.jpg)
Optimal Configuration of Compute Nodes for SAR Processing
(Highlights from Year 1)
• Motivation and SAR Basics
• Parallelization of SAR Processing
• The Optimal Configuration Problem• Formulation• Numerical Results
• Conclusions
![Page 18: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/18.jpg)
Nominal UAV Payload
“Predator”
![Page 19: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/19.jpg)
Targets
Azim
uth
Velo
city
Range
Footprint
Footprint of Aerial Side-Looking SAR
![Page 20: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/20.jpg)
Offset Overlapping Beams
vReal Azimuth Resolution
Rs
![Page 21: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/21.jpg)
Synthetic Beams
Azim
uth
vR
Rs
CompressedResolution
![Page 22: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/22.jpg)
Optimal Configuration of Compute Nodes for SAR Processing
(Highlights from Year 1)
• Motivation and SAR Basics
• Parallelization of SAR Processing
• The Optimal Configuration Problem• Formulation• Numerical Results
• Conclusions
![Page 23: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/23.jpg)
Parallelization of SAR Processing
DistributedCorner-Turn
1
Ran
ge S
ampl
es
Pulse No.
Range Samples
Puls
e N
o.
Range Processing(shown across 3 range processors)
Azimuth Processing(shown across 4 azimuth processors)
1
1
1
K r
Sa
Sa
K r
where Sa is the azimuth section length and Kr is the range reference kernel size
Reference:T. Einstein, “Realtime Synthetic Aperture Radar Processing on the RACE Multicomputer,” App. Note 203.0, Mercury Computing Sys, 1996.
![Page 24: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/24.jpg)
Sectioned Convolution
Kernel
Discard
OverlapSection
FFT size
Large Overlap/Section ratio ⇒ Small azimuth memory, large number azimuth processorsSmall Overlap/Section ratio ⇒ Large azimuth memory, small number azimuth processors
Reference:T. Einstein, “Realtime Synthetic Aperture Radar Processing on the RACE Multicomputer,” App. Note 203.0, Mercury Computing Sys, 1996.
![Page 25: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/25.jpg)
System Parameters
• radar-dependent: R (range), Rs (range swath), and λ (wavelength)
• application-dependent: δ (desired resolution) and v (platform velocity)
• processor-dependent: αr and αa (non-fast-convolution range and azimuth loading) and γ (fast convolution throughput)
• software-dependent: Sa (azimuth convolution section length), Fa (azimuth FFT length), and Fr(range FFT length)
![Page 26: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/26.jpg)
Derivations for Memory and Processor Requirements
Pv F R F F
PvR
F FS
MR v F R F F
MR R S
rr r s r r
a
s aa a
a
rs r r s r r
as a
=+ +
=+
+
=+ +
=+
( lg )
( lg )
( lg )
( )
6 10
6 10
16 6 10
2
2
2
3
2
3
δ α γ δγδ
αγ
δ
δ α γ δγδ
λ δδ
![Page 27: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/27.jpg)
Optimal Configuration of Compute Nodes for SAR Processing
(Highlights from Year 1)
• Motivation and SAR Basics
• Parallelization of SAR Processing
• The Optimal Configuration Problem• Formulation• Numerical Results
• Conclusions
![Page 28: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/28.jpg)
• Objective: Determine configurations for the CNs, number of CNs of each configuration, and section size, to satisfy processor and memory requirements and minimize power consumption
• Notation and Definitions:• CN Configuration: Specifies the daughtercard type
and number of range and azimuth CEs (per configured CN)
• X, Y: The two possible CN configurations• XT, YT: Daughtercard type for each CN configuration
Optimal Configuration Formulation
![Page 29: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/29.jpg)
• Notation and Definitions:• Xr, Yr: Number of range processors per CN
(for each configuration)• Xa, Ya: Number of azimuth processors per CN
(for each configuration)• NX, NY: Number of CNs of configurations X and Y• ΠCN(•): Power per CN as a function of
daughtercard type• MCN(•): Memory per CN as a function of
daughtercard type• PCN(•): Processors per CN as a function of
daughtercard type
Optimal Configuration Formulation
![Page 30: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/30.jpg)
1,0,,,,,
,....2,1,2
)()(
)()()(
)()()(
)(
)()(
≥≥
=+≥=
≤+≤+
+≥
+≥
+≤+≤
+=
aararYX
aak
a
TCNar
TCNar
aa
aaa
r
rrTCN
aa
aaa
r
rrTCN
aYaXaa
rYrXr
TCNYTCNX
SYYXXNN
kKSF
YPYYXPXX
SPSMY
PMYYM
SPSMX
PMXXM
YNXNSPYNXNP
YΠNXΠNZMinimize:
Subject to:
Optimal Configuration Formulation
![Page 31: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/31.jpg)
Optimal Configuration of Compute Nodes for SAR Processing
(Highlights from Year 1)
• Motivation and SAR Basics
• Parallelization of SAR Processing
• The Optimal Configuration Problem• Formulation• Numerical Results
• Conclusions
![Page 32: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/32.jpg)
Minimum Power
![Page 33: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/33.jpg)
Azimuth FFT Size
![Page 34: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/34.jpg)
Optimal Azimuth Section Size
![Page 35: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/35.jpg)
Optimal Ratio of Kernel Size to Section Size
![Page 36: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/36.jpg)
Percentage of Power Usage by Card Type 1
![Page 37: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/37.jpg)
Optimal CN Configurations
0.5 1 1.5 250
100
150
200
250
300
350
400
Resolution
Vel
ocity
1 1 22 1 11 1 2 1 2 1
XT Xr Xa YTYrYa
1 1 2 2 0 1
1 2 1 2 0 21 3 0 2 0 21 3 0 2 1 12 0 2 2 1 1
1 1 2 2 1 1
2 1 1 2 2 0
1 1 2 2 0 2
![Page 38: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/38.jpg)
Optimal Configuration of Compute Nodes for SAR Processing
(Highlights from Year 1)
• Motivation and SAR Basics
• Parallelization of SAR Processing
• The Optimal Configuration Problem• Formulation• Numerical Results
• Conclusions
![Page 39: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/39.jpg)
Conclusions
• A method for optimally configuring CN-based parallel systems for SAR processing was introduced.
• The method provides detailed HW and SW design and implementation information about how to best utilizesystem resources for given values of application parameters.
• The numerical studies show that the optimal ratio of daughtercard types can be relatively constant over regions of the application parameter space.
• For a fixed hardware configuration, the CNs can be re-configured (via software re-configuration) to achieve optimal power consumption over specified regions.
![Page 40: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/40.jpg)
Highlights from Year 1
• Optimal Configuration of Compute Nodes for SAR Processing
• Network Simulator
• FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
• FPGA Power Prediction Simulator
![Page 41: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/41.jpg)
Network Simulator(Highlights from Year 1)
• Parallel STAP: The Motivation behind the Network Simulator
• Overview of the Network Simulator
• Numerical Studies
• Conclusions
![Page 42: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/42.jpg)
Pulses Pulses
Data Cube
Data Cube
Doppler Filter
Channels
Ran
ge
Ran
ge
Channels
Beamform
Beam Outputs
Ran
ge
Pulses
QR Decomposition
Rotate
Channels
Ran
ge
Pulses
Data Cube
Steering Vectors
Weights
Input Data
RotatePulse
Compress
Data CubeC
hann
els
Pulses
Range
STAPSTAP PPROCESSING ROCESSING FFLOWLOW
![Page 43: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/43.jpg)
1. Partition STAP data cube over a 2-D process set.
2. Process the contiguous dimension.
3. Re-partition the data cube before processing the next dimension.
4. Rotate the newly distributed data to make the next dimension sequential in memory.
5. Repeat steps 1 through 4 before each processing phase.
SSUBUB--CUBE CUBE BBAR AR PPARTITIONING ARTITIONING MMETHODOLOGYETHODOLOGY
![Page 44: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/44.jpg)
Pulse Compression Partitioningwith range dimension whole.Pulse Compression Partitioningwith range dimension whole.
Pulses Range
Cha
nnel
s
Cha
nnel
s
1 32 4
5 76 8
9 1110 12
Pulses
+
3 x 4 Process Set
Pulses
5
1
9
Range
Cha
nnel
s
Doppler Filtering Partitioningwith pulses dimension whole.Doppler Filtering Partitioningwith pulses dimension whole.
Pulses Range
Cha
nnel
s
9 10 11 12
5 6 7 8
1 2 3 4
Pulses Range
Cha
nnel
s
+
Cha
nnel
s
1 32 4
5 76 8
9 1110 12
Range
3 x 4 Process Set
STAPSTAP DDATA ATA CCUBE UBE PPARTITIONING ARTITIONING EEXAMPLESXAMPLES
![Page 45: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/45.jpg)
Pulses
5
1
9
Range
Cha
nnel
s• Re-Partitioning involves exchanging data with the next whole dimension.
Cha
nnel
s
1 32 4
5 76 8
9 1110 12
Pulses
3 x 4 Process Set
Range Dimension is Contiguous
Cha
nnel
s
1 32 4
5 76 8
9 1110 12
Range
3 x 4 Process Set
Pulse Dimension is Contiguous
• Interprocessor Communication is required between processors in the same row.
Pulses
Range
Cha
nnel
s
9 10 11 12
5 6 7 8
1 1 1 2 1 3 1 4
STAPSTAP DDATA ATA CCUBE UBE RREPARTITIONINGEPARTITIONING
![Page 46: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/46.jpg)
Required Data TransfersRequired Data Transfers
Network Interconnection ConfigurationNetwork Interconnection Configuration
6-PortCrossbar
CN CN CN CN
12
3
45
6 78
9
1011
12
IPC
56
78
910
1112
Cha
nnel
12
34Pulses Range
Pulse Compression
1
4CN
7
10
CN
CN
CN
CN
CN
3
4
3
3
4
3
Doppler Filtering
Pulses
Cha
nnel
Range
9 10 11 12
5 6 7 8
1 2 3 4
STAPSTAP DDATA ATA CCUBE UBE RREPARTITIONINGEPARTITIONING
Data ReData Re--distribution Mappingdistribution Mapping
![Page 47: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/47.jpg)
Network Simulator(Highlights from Year 1)
• Parallel STAP: The Motivation behind the Network Simulator
• Overview of the Network Simulator
• Numerical Studies
• Conclusions
![Page 48: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/48.jpg)
1. 40Mhz clock, 32 bit data paths, 2048 byte circuit-switched packets.
2. Contention resolved using priorities.a. User-programmable message priority
b. Hardware priority assigned at each crossbar along a path (based on complex connection rules)
3. A packet with higher priority preempts (suspends) a lower priority packet (active or inactive) to gain control of a crossbar port.
SSOMEOME RACERACENNETWORK ETWORK FFEATURESEATURES
![Page 49: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/49.jpg)
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
CNCN CNCNCNCN CNCN CNCN CNCNCNCN CNCN CNCN CNCNCNCN CNCNCNCN CNCN CNCN CNCN
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
CN
6-PortCrossbar
6-PortCrossbar
Message DestinationMessage DestinationMessage SourceMessage Source
MessagePath
MessagePath
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
6-PortCrossbar
CN
RACERACE NNETWORK ETWORK IINTERCONNECTNTERCONNECTFFATAT--TTREE REE TTOPOLOGYOPOLOGY
6-PortCrossbar
6-PortCrossbar
CNCN
6-PortCrossbar
![Page 50: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/50.jpg)
SSTANDARD TANDARD CCROSSBAR ROSSBAR PPRIORITY RIORITY AARBITRATION RBITRATION AALGORITHM LGORITHM TTABLEABLE
7 F A,B,C,D,E F A,B,C,D,E F A,B,C,D6 E F E F A,B,C,D* A,B,C,D*5 A,B,C,D F A,B,C,D F A,B,C,D F4 E A,B,C,D E A,B,C,D - -3 *A,B,C,D *A,B,C,D,E A,B,C,D* A,B,C,D* - -2 - - A,B,C,D E - -1 - - - - - -
HardwarePriority Entry Port Exit Port Entry Port Exit Port Entry Port Exit Port
Active Port E InvolvedNot Yet Active
Port E Not Involved
Transaction Status
* - Peer Kill Rules Apply
![Page 51: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/51.jpg)
CrossbarCrossbar CrossbarCrossbar
CrossbarCrossbar
Compute NodeProcessor InformationOutgoing and Received Message QueuesOutgoing and Received Packet Stack
Compute NodeProcessor InformationOutgoing and Received Message QueuesOutgoing and Received Packet Stack
LinkLink
Random ScanGenerates Pseudo-Random CN Scan Ordering
Random ScanGenerates Pseudo-Random CN Scan Ordering
ClockBased on Network Clock Frequency (factor of 5)Data Transfer Rate Equates to Effective Network Bandwidth
ClockBased on Network Clock Frequency (factor of 5)Data Transfer Rate Equates to Effective Network Bandwidth
Dynamic Network ConstructionDynamic Routing Table CreationDynamic CN and CE Message Traffic GenerationSimulates Packet Traffic
Dynamic Network ConstructionDynamic Routing Table CreationDynamic CN and CE Message Traffic GenerationSimulates Packet Traffic
Network Methods
NNETWORK ETWORK CCLASS LASS DDETAILSETAILS
![Page 52: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/52.jpg)
Implements Hardware Priority Arbitration • TOP-LEVEL ALGORITHM• STANDARD ALGORITHM
Query Port StatusRoutes Packets to Next LocationAllocates and Frees Internal Port Connections and Connected Link ObjectsTransmits Packet Data
Implements Hardware Priority Arbitration • TOP-LEVEL ALGORITHM• STANDARD ALGORITHM
Query Port StatusRoutes Packets to Next LocationAllocates and Frees Internal Port Connections and Connected Link ObjectsTransmits Packet Data
Crossbar Methods
LinkConnects Crossbar Objects Link Status: Occupied or Free
LinkConnects Crossbar Objects Link Status: Occupied or Free
CrossbarTwo Parent Port ConnectionsFour Child Port ConnectionsInternal Switch ConnectionsFour CN Connections for TerminalCrossbars.
CrossbarTwo Parent Port ConnectionsFour Child Port ConnectionsInternal Switch ConnectionsFour CN Connections for TerminalCrossbars.
CCROSSBAR ROSSBAR CCLASS LASS DDETAILSETAILS
![Page 53: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/53.jpg)
Compute Node Methods:Manages Outgoing and Received MessageQueuesManages Outgoing and Received Packet StackExplodes the Top Outgoing Message into Packets of Size 2048 or LessHandles DMA Chaining of PacketsEstablishes Path Through Network and Transmits Packet Data
Compute Node Methods:Manages Outgoing and Received MessageQueuesManages Outgoing and Received Packet StackExplodes the Top Outgoing Message into Packets of Size 2048 or LessHandles DMA Chaining of PacketsEstablishes Path Through Network and Transmits Packet Data
Outgoing Message QueueOutgoing Message Queue
Message 1
Message 2
Message 3
::
Packet StackPacket StackEXPLODE
Compute NodeProcessor InformationOutgoing and Received Message QueuesOutgoing and Received Packet Stack
• PACKETS ARE SELF-ROUTING
Compute NodeProcessor InformationOutgoing and Received Message QueuesOutgoing and Received Packet Stack
• PACKETS ARE SELF-ROUTING
::
Packet 2Packet 3Packet 4
Packet 1
CCOMPUTE OMPUTE NNODE ODE CCLASS LASS DDETAILSETAILS
![Page 54: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/54.jpg)
SSIMULATOR IMULATOR UMLUMLSSEQUENCE EQUENCE DDIAGRAMIAGRAM
NetworkNetwork CrossbarCrossbarData CubeData Cube Process SetProcess Set CNCN<<actor>>
User<<actor>>
User ClockClock
Pass 1
Pass 2
Increment Simulation
Clock
Build Messages
R:200,P:22,C:16
CEs:48
X:6, Y:8
Routing:FCN Traffic,
Phase 1 DMA:Y
Connection/Data
Transfer
Clean Up
Message Matrices
X, Y,MappingMatrices
SimulationTime = 2 msSimulation
Time = 2 ms
Messages Time* iterative process
![Page 55: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/55.jpg)
PPACKETACKET UML SUML STATECHARTTATECHARTSimulation Simulation Pass 1Pass 1 and and Pass 2Pass 2
Simulation Pass Subsystem
Start UpStart Up
Waitingfor Kill
Waitingfor Kill
CompletedCompletedSuspendedSuspended
BlockedBlocked ActiveActive
ReadyReady
Pass 1
Pass 2
![Page 56: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/56.jpg)
Network Simulator(Highlights from Year 1)
• Parallel STAP: The Motivation behind the Network Simulator
• Overview of the Network Simulator
• Numerical Studies
• Conclusions
![Page 57: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/57.jpg)
Process Set - Phase 1 (CN:12, R:200, P:22, C:16, Routing:F)
05
101520253035404550
0.5 1 1.5 2
Time (ms)
Coun
t
CN 12 (12x3)CN 12 (9x4)CN 12 (6x6)CN 12 (4x9)
PPROCESSROCESS SSETETPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 1Communication Phase 1
![Page 58: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/58.jpg)
PPROCESSROCESS SSETETPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 2Communication Phase 2
Process Set - Phase 2 (CN:12, R:200, P:22, C:16, Routing:F)
0123456789
10
3 3.5 4 4.5 5 5.5 6
Time (ms)
Coun
t
CN 12 (12x3)CN 12 (9x4)CN 12 (6x6)CN 12 (4x9)
Process Set - Phase 2 (CN:12, R:200, P:22, C:16, Routing:F)
0123456789
10
3 3.5 4 4.5 5 5.5 6
Time (ms)
Coun
t
CN 12 (12x3)CN 12 (9x4)CN 12 (6x6)CN 12 (4x9)
![Page 59: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/59.jpg)
MMESSAGEESSAGE TTRAFFICRAFFICPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 1Communication Phase 1
Message Traffic - Phase 1 (CN:16, X:12, Y:4, R:400, P:22, C:16, Routing:EF)
0123456789
2 2.1 2.2 2.3 2.4 2.5
Time (ms)
Coun
t CN TrafficCE Traffic
Message Traffic - Phase 1 (CN:16, X:12, Y:4, R:400, P:22, C:16, Routing:EF)
0123456789
2 2.1 2.2 2.3 2.4 2.5
Time (ms)
Coun
t CN TrafficCE Traffic
![Page 60: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/60.jpg)
Message Traffic - Phase 2 (CN:16, X:12, Y:4, R:400, P:22, C:16, Routing:EF)
012345678
10 15 20 25
Time (ms)
Coun
t CN TrafficCE Traffic
Message Traffic - Phase 2 (CN:16, X:12, Y:4, R:400, P:22, C:16, Routing:EF)
012345678
10 15 20 25
Time (ms)
Coun
t CN TrafficCE Traffic
MMESSAGEESSAGE TTRAFFICRAFFICPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 2Communication Phase 2
![Page 61: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/61.jpg)
DMADMA CCHAININGHAININGPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 1Communication Phase 1
DMA Chaining - Phase 1 (CE:24, X:8, Y:3, R:800, P:32, C:22, Routing:F)
0123456789
14 16 18 20 22
Time (ms)
Coun
t ChainingNo Chaining
DMA Chaining - Phase 1 (CE:24, X:8, Y:3, R:800, P:32, C:22, Routing:F)
0123456789
14 16 18 20 22
Time (ms)
Coun
t ChainingNo Chaining
![Page 62: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/62.jpg)
DMADMA CCHAININGHAININGPPERFORMANCE ERFORMANCE MMETRICETRIC
Communication Phase 2Communication Phase 2
DMA Chaining - Phase 2 (CE:24, X:8, Y:3, R:800, P:32, C:22, Routing:F)
012345678
21 22 23 24 25 26 27
Time (ms)
Coun
t ChainingNo Chaining
DMA Chaining - Phase 2 (CE:24, X:8, Y:3, R:800, P:32, C:22, Routing:F)
012345678
21 22 23 24 25 26 27
Time (ms)
Coun
t ChainingNo Chaining
![Page 63: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/63.jpg)
Network Simulator(Highlights from Year 1)
• Parallel STAP: The Motivation behind the Network Simulator
• Overview of the Network Simulator
• Numerical Studies
• Conclusions
![Page 64: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/64.jpg)
1. Designed and implemented a platform independent simulator.
4. Communication pattern implemented for STAP but may be used for other applications with phased communication pattern.
2. Simulator demonstrates that the Process Set, the CN or CE Message Traffic, the DMA chaining, the adaptive routing, and the scheduling of the messages affects performance.
3. Allows users to experiment with possible current and future configurations.
CCONCLUSIONSONCLUSIONS
![Page 65: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/65.jpg)
Highlights from Year 1
• Optimal Configuration of Compute Nodes for SAR Processing
• Network Simulator
• FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
• FPGA Power Prediction Simulator
![Page 66: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/66.jpg)
FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
(Highlights from Year 1)
• Overview of STAP Weight Calculation
• Two FPGA Inner-Product Circuit Designs
• Numerical Accuracy Studies
• Conclusions
![Page 67: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/67.jpg)
Doppler Filter
Weight Computation
Steering Vector
Input Data
Pulse Compress Data Cube Data Cube
Weight Application
ThresholdDetection
Target Decision
Typical STAP Processing Flow
pulses
range
Doppler
range8%
91.5%
0.5%
CovarianceMatrix
![Page 68: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/68.jpg)
Space-Time Adaptive Processing
• Effective partially adaptive STAP technique
• The architecture consists of
• Doppler processing across all pulse repetition intervals
• Adaptive filtering across• all channels and• K adjacent Doppler bins
Kth- Order Doppler Factored STAP
![Page 69: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/69.jpg)
1 31 ˆ:),(
×=× NN
rkx
r
∑+−=
=bL
rkxrkx
bkR
rLbr
H
rL 1)1(
),(),(1
),(ψ
Kth-Order Doppler Factored STAP
bth Ran
ge
Segm
ent
(with
L rce
lls)N
Cha
nnel
s
Doppler
k (k - 1)(k + 1)
Data matrix needed for calculating covariance matrix for kth Doppler Bin
and bth Range Segment using Kth-OrderDoppler Factored STAP with K = 3
![Page 70: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/70.jpg)
Matrix-Based Derivation of
rr LNLN
bk
3 ˆ:),(
×=×
X
),(),(1
),(),(1),(1)1(
bkbk
bLrkxrkxbk
H
r
Lbr
H
r
L
LR
r
XX
ψ
=
= ∑+−=
sbkwbk =),(),(ψ
The Weight Equation:
),( bkψ
![Page 71: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/71.jpg)
STAP Weight Calculation
sLbkwRR
RR
sbkwRRL
bkwRQQRL
QRbk
sbkwbkbkL
sbkwbk
rT
TT
T
r
TT
r
T
H
r
=
=
==
=
=
=
),(
]0[ that Note
),(1),(1
),( :onDecomposti QR Take
),(),(),(1
),(),(
*11
1
***
X
XX
ψ
onsubstituti backward using ),(for Solve
),(
neliminatio forward using for Solve
),(Let
*1
1
*1
bkw
pbkwR
p
sLpR
pbkwR
rT
=
=
=
sw =ψ :Equation Weight thesolve toMethodion decomposit-QR Using
![Page 72: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/72.jpg)
Iteration
STAP Weight Calculation
Initialization
ikTi
iTi
ii
ii
ii
Ti
iTi
ii
ddd
dggd
swg
ddd
dgww
+−=
−=
−=
+++
++
+
)(1
11
11
1
ψψ
ψ
ψ
sw =ψ :Equation Weight thesolve toMethodGradient Conjugate Using
00000 ,set , Choose dgwsdw −=−= ψ
![Page 73: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/73.jpg)
Numerical Studies
Lr = 125
Flop
Cou
nt
108
109
1010
10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1
Tolerance
CGQR
10-8 10-7 10-6 10-5 10-4 10-3 10-2 10-1
Lr = 250
Tolerance
1010
109
108
Flop
Cou
nt
Tolerance
CGQR
![Page 74: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/74.jpg)
FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
(Highlights from Year 1)
• Overview of STAP Weight Calculation
• Two FPGA Inner-Product Circuit Designs
• Numerical Accuracy Studies
• Conclusions
![Page 75: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/75.jpg)
+
Output Register
a b
Sign+16 bitmantissa
Normalizing unit
1’s comp/register
a bsign of a
a b
b
BUFFER
X
BUFFER
FPGA
BOARD
INTERCONNECTION
BUS
HOSTPROCESSOR
• Multiply-Accumulate Pipe• Reads two block floating
point operands per cycle • Performs two operations
per cycle• Performs exponent
normalization prior to accumulation
• 2 N-vectors reduced to a constant number of partial sums
FPGA Inner Product Co-Processor:Design 1
![Page 76: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/76.jpg)
• Multiply-Add Reduction Pipe• Reads four operands
per cycle • Performs three operations
per cycle• No normalization required• 2 N-vectors reduced to N/2 partial sums
• Basic Tradeoff: First design has lower throughput, but can perform more work
X X
1’s comp/register
Sign bSign a
+
Sign+16 bit mantissa
INTERCONNECTION
BUS
HOSTPROCESSOR
BUFFER
BUFFER
FPGA
BOARD
2 ff
Data forFirst
Multiplier
Data forSecond
Multiplier
Unitclocked
here
FPGA Inner Product Co-Processor:Design 2
![Page 77: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/77.jpg)
FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
(Highlights from Year 1)
• Overview of STAP Weight Calculation
• Two FPGA Inner-Product Circuit Designs
• Numerical Accuracy Studies
• Conclusions
![Page 78: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/78.jpg)
Two Orders of Magnitude Experiment
Accuracy HistogramDesign 1
0
1
2
3
4
5
6
7
0.999893 0.9999015 0.99991 0.9999185 0.999927
Freq
uenc
y
Data Histogram
05
101520253035404550
0 7 14 21 27 34 41 48 55 62 69 76 82 89 96 103
110
Freq
uenc
y
Exponent Histogram
050
100150200250300350400450500
119
121
123
125
127
129
131
133
135
137
139
141
143
145
Freq
uenc
y
Accuracy HistogramDesign 2
0
50
100
150
200
250
0.99
399
0.99
436
0.99
474
0.99
511
0.99
549
0.99
586
0.99
624
0.99
661
0.99
699
0.99
736
0.99
774
0.99
811
0.99
849
0.99
886
0.99
924
0.99
961
0.99
999
Freq
uenc
y
![Page 79: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/79.jpg)
Five Orders of Magnitude Experiment
Accuracy HistogramDesign 1
0
1
2
3
4
5
6
7
8
0.999912 0.99991875 0.9999255 0.99993225 0.999998
Freq
uenc
y
Data Value Histogram
05
101520253035404550
0
6867
1373
4
2060
2
2746
9
3433
6
4120
3
4807
0
5493
7
6180
5
6867
2
7553
9
8240
6
8927
3
9614
1
1030
08
Freq
uenc
y
Exponent Histogram
0
100
200
300
400
500
600
700
800
119 121 123 125 127 129 131 133 135 137 139 141 143
Freq
uenc
y
Accuracy HistogramDesign 2
0
50
100
150
200
250
300
0.00
000
0.06
250
0.12
500
0.18
750
0.25
000
0.31
249
0.37
499
0.43
749
0.49
999
0.56
249
0.62
499
0.68
749
0.74
999
0.81
249
0.87
499
0.93
748
0.99
998
Freq
uenc
y
![Page 80: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/80.jpg)
“Outlyer” Experiment
Accuracy HistogramDesign 2
0
5
10
15
20
25
30
35
40
45
50
0.00
0.06
0.12
0.17
0.23
0.29
0.35
0.40
0.46
0.52
0.58
0.64
0.69
0.75
0.81
0.87
0.92
Freq
uenc
y
Exponent Histogram
0
100
200
300
400
500
600
114
116
118
120
122
124
126
128
130
132
134
136
138
Freq
uenc
y
Data Value Histogram
0
200
400
600
800
1000
1200
0.00
09
62.5
008
125.
0007
187.
5007
250.
0006
312.
5006
375.
0005
437.
5005
500.
0004
562.
5004
625.
0003
687.
5003
750.
0002
812.
5002
875.
0001
937.
5001
1000
.000
0
Freq
uenc
y
Accuracy HistogramDesign 1
0
2
4
6
8
10
12
0.593067 0.6398925 0.686718 0.7335435 0.78369
Freq
uenc
y
outlyeroutlyer
![Page 81: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/81.jpg)
FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
(Highlights from Year 1)
• Overview of STAP Weight Calculation
• Two FPGA Inner-Product Circuit Designs
• Numerical Accuracy Studies
• Conclusions
![Page 82: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/82.jpg)
Conclusions
• CG weight solver provides tradeoff between accuracy and required FLOPs(compared to QR weight solver)
• Tradeoff between two FPGA designs: Design 1 (Mult & Accum) has lower peak throughput, but can perform more total work than Design 2
• Block floating point provides acceptable accuracy for uniformly distributed data over reasonable dynamic ranges
• Block floating point accuracy breaks down when there are a few large outlyers in the data set
![Page 83: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/83.jpg)
Highlights from Year 1
• Optimal Configuration of Compute Nodes for SAR Processing
• Network Simulator
• FPGA Inner-Product Co-Processor Designs for STAP Weight Solvers
• FPGA Power Prediction Simulator
![Page 84: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/84.jpg)
FPGA Power Prediction Simulator
(Highlights from Year 1)
• CMOS Power Consumption and Past Research
• Design and Implementation of the Power Prediction Simulator
• Conclusions and Demo
![Page 85: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/85.jpg)
Leakage CurrentDynamic Capacitance Charging Current
Most important for CMOSDependant on clock frequency
Power Dissipation in CMOS
Transient Current
Dependant on signal activityDependant on signal activity
![Page 86: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/86.jpg)
Power Equations
Equivalent model of a transistor’s gate...
( )
−=
−RC
teVtvc 1
( ) RCt
VetvR
−=
( )ReVtp
RCt
R
22
−
=
∫∫−
−
−−
==ττ
ττ0
22
0
22 2
21 dte
RCCVdt
ReVp RC
tRCt
avg
222
21
2CVeCVp
o
RCt
avg ττ
τ
≈−
=−
![Page 87: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/87.jpg)
( ) 50.0=clockp
( ) 88.01 =xp
( ) 29.02 =xp
( ) 69.03 =xp ( ) 27.03 =xA
( ) 0.1=clockA
( ) 10.01 =xA
( ) 17.02 =xA
p(s): the probability that signal sattains a logical value of true at any given clock cycle.
A(s): the probability that signal stransitions at any given clock cycle.
Probabilistic Modeling
![Page 88: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/88.jpg)
Probabilistic Modeling
x3
x2
x1
y
y
x3
x2
x1
:)(1 tx:)(2 tx:)(3 tx
:)(21 txx:)(321 txxx
p=0.88, A=0.10
p=0.29, A=0.17
p=0.69, A=0.27
p=0.83, A=0.17
p=0.10, A=0.13
Calculation of average power:
∑∈
=gates all
2
21
ggavg ACVP
![Page 89: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/89.jpg)
Probabilistic Equations
( )
( )1 where,)(1
1
===
=
∏∑
∑ ∏
=
=
ii
k
ii
k
ii
Pyp
f
ππ
( ) ( )
( ) ( ){ }
( ) ( ){ }
∑∑ ∏
∑ ∏
∑ ∏
+
−⊕+
−⊕+
−⊕
⋅=
===≠≠ ∉
==≠ ∉
= ≠
X n
kjikji kjil
llkkjjiikji
n
jiji jik
kkjjiiji
n
i ijjjiii
xzPxzPxzPxzPzzzXfXf
xzPxzPxzPzzXfXf
xzPxzPzXfXf
XPyA
K
1,1,1,,
1,1,
1
)(1)()()(),,;()(31
)(1)()(),;()(21
)(1)();()(
)()(
*
* Probabilistic Treatment of General Combinatorial Networks† Estimation of Circuit Activity Considering Signal Correlations and Simultaneous Switching
Signal probability transformations...
Signal activity transformations...†
![Page 90: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/90.jpg)
FPGA Power Prediction Simulator
(Highlights from Year 1)
• CMOS Power Consumption and Past Research
• Design and Implementation of the Power Prediction Simulator
• Conclusions and Demo
![Page 91: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/91.jpg)
FPGA Design
FPGA internal structure design...
CLB
IOB BUF
![Page 92: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/92.jpg)
Routing Fabric Design
Example routings...
Xilinx 4000 series routing fabric is very intricate.
Xilinx synthesis tools use shortest path routing where possible.
The distance the signal travels is the metric considered in this model.
![Page 93: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/93.jpg)
Signal Design
Symbolic Probability
Numeric Probability
Numeric Activity
Signal Reference
Manhattan Distance
CLBCLB
R
L
Local Signal Remote Signal
![Page 94: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/94.jpg)
Routing Example
4
4 InterconnectionLUT
LUT
LUT
LUT
LUT
LUT
![Page 95: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/95.jpg)
Routing Signal Connections
R
R
R
R
R
R
R
R
L
L
L
RRRR
RRRR
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
L
L
L
RRRR
RRRR
R
R
R
R
R
R
R
R
LUT
LUT
LUT
LUT
LUT
LUT
L
L
L
L
![Page 96: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/96.jpg)
FPGA Power Prediction Simulator
(Highlights from Year 1)
• CMOS Power Consumption and Past Research
• Design and Implementation of the Power Prediction Simulator
• Conclusions
![Page 97: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/97.jpg)
Conclusions
• Designed and Implemented power prediction simulator for Xilinx 4000 series FPGAs.
• Inputs to simulator:• Place & Route bit stream (from Xilinx Tool)• Activity and Probability factors for pin signals
• Simulator calculates probabilities and activities for all internal signals
• Tool outputs power consumption of FPGA chip
• Currently calibrating/tuning simulator using both heat and DC current measurement cross-calibration methods
![Page 98: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/98.jpg)
• Program Overview and Introduction (Quad Chart)
• Program Management Status
• Highlights from Year 1
• Highlights from Year 2
• Work to be Completed
Outline
![Page 99: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/99.jpg)
Highlights from Year 2
• Efforts to Calibrate the FPGA Power Prediction Simulator
• Comparison of Integer and Floating Point Computations on FPGAs
• Architecture of Prototype System for SAR and STAP Processing
• Integration of Reconfigurable Computing into SAR
• Configuration Technique for STAP
![Page 100: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/100.jpg)
Basic Approach to Calibration
• N x N array of CLBs (configurable logic blocks)
• Programmable interconnect• Let S denote the set of all internal
signals for a configuration and Si denote all signals of length i
• Let Ai denote the sum of activities for all signals of length i
• 2N + 1 distinct capacitances (C) dependent on signal length
sSs
sdavg ACfVP ∑∈
= )(2
21
+⋅⋅⋅++
= ∑∑∑
∈∈∈ NSssN
Sss
Sssavg ACACACfVP
210
2102
21
![Page 101: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/101.jpg)
Basic Approach to Calibration
=
NNNNNN
N
N
P
PP
C
CC
AAA
AAAAAA
fV
2
1
0
2
1
0
2,21,20,2
2,11,10,1
2,01,00,0
2
21
MMMOM
L
L
• For the j-th design/data set combination:let Pj denote the measured power let Aj,k denote the aggregate activity of all signalsof length k
• For each design/data set combination, the simulator provides the values for one row of the above matrix
• Given 2N + 1 measured values for Pj, the unknown capacitance values are then determined. This is how the simulator is calibrated.
![Page 102: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/102.jpg)
Efforts to Calibrate the Simulator
• For the Xilinx 4036 family of parts, N = 36
• Generated a total of 73 (= 2N + 1) design/data set combinations
• Created a utility for generating data sets with specified statistics
• Created a utility for computing statistics associated with a given data set
• Attempts at Measuring Consumed Power• Heat• Current
![Page 103: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/103.jpg)
Heat Measurement Approach
![Page 104: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/104.jpg)
Heat Measurement Approach(continued)
![Page 105: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/105.jpg)
Current Measurement Approach
![Page 106: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/106.jpg)
Highlights from Year 2
• Efforts to Calibrate the FPGA Power Prediction Simulator
• Comparison of Integer and Floating Point Computations on FPGAs
• Architecture of Prototype System for SAR and STAP Processing
• Integration of Reconfigurable Computing into SAR
• Configuration Technique for STAP
![Page 107: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/107.jpg)
Comparison of Integer and Floating Point Computations on FPGAs
(Highlights from Year 2)
• Integer Pipelined Multiplier
• Floating Point Pipelined Multiplier
• Floating Point Pipelined Adder
• Comparison of Two Inner-Product Designs
• Conclusions
![Page 108: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/108.jpg)
Array-Based Integer Multiplier
CSA 9
CSA 8
CSA 7
CSA 6
CSA 5
CSA 4
CSA 3
CSA 2
CSA 1
CSA 0
Propagate Adder
b0Ab1Ab2Ab3Ab4Ab5Ab6Ab7Ab8Ab9Ab10Ab11A
sumcarry
![Page 109: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/109.jpg)
Carry-Save Addersin a 5-bit Multiplier
Half AdderFull AdderFull AdderFull AdderHalf Adder
Half AdderFull AdderFull AdderFull AdderFull Adder
Half AdderFull AdderFull AdderFull AdderFull Adder
Half AdderFull Adder
Full AdderFull Adder
b3a0b3a1b3a2b3a3b3a4
b4a0b4a1b4a2b4a3b4a4
b2a0b2a1b2a2b2a3b2a4
b1a0b1a1b1a2b1a3b1a4
b0a0b0a1b0a2b0a3b0a4
CSA 0
CSA 1
CSA 2
Propagate Adder
Full Adder
![Page 110: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/110.jpg)
Half Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
Full Adder
sumcarry
sumcarry
upper 13 bits of product
CSA 9
Propagate Adder
![Page 111: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/111.jpg)
• The Wild-One system runs at a maximum speed of 50MHz
• The 4036xla has more routing resources than the 4028ex
• Table shows maximum achieved clock rate as a function of the number of pipelined stages employed
# of stages Speed(Mhz)4028ex 4036xla
1 14 282 19 253 21 N/A4 22 275 29 286 39 287 22 298 33 50
Pipelining Results forArray-Based Integer Multiplier
![Page 112: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/112.jpg)
Comparison of Integer and Floating Point Computations on FPGAs
(Highlights from Year 2)
• Integer Pipelined Multiplier
• Floating Point Pipelined Multiplier
• Floating Point Pipelined Adder
• Comparison of Two Inner-Product Designs
• Conclusions
![Page 113: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/113.jpg)
16-bit Floating-Point Format
• The floating point format chosen is a 16-bit format supported by the ADSP-2106x family of SHARC DSP processors
• The exponent is represented in excess-7 notation
• Range : ±1.5625×10-2 to ±2.559375×102
101.f0e3e 0fs • • • • • •
Short Word Floating-Point Format15 14 11 10 0
![Page 114: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/114.jpg)
Floating Point Multiplier
0
12 bit Array-Based Multiplier
1.m1 1.m2
1 0
1
1
0
1
excess-7 adder
exponentadjustselect
e1(2)
e2(3)
e2(2)e1(3)
e1(1)e2(1)
e1(0)e2(0)
unf ovf
If the msb = 1 take thebits msb-1…msb-11
If the msb = 0 take thebits msb-2…msb-11
exponent
11
upper 13 bits of product
e2e1
mantissa
If underflow = 1, set exponent = 0If overflow = 1, set exponent = 15
(representing infinity)
If e1 or e2 = 0, set exponent = 0If e1 or e2 = 15, set exponent = 15
s2s1
mantissaexponentsign1 bit 4 bits 11 bits
![Page 115: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/115.jpg)
Comparison of Integer and Floating Point Computations on FPGAs
(Highlights from Year 2)
• Integer Pipelined Multiplier
• Floating Point Pipelined Multiplier
• Floating Point Pipelined Adder
• Comparison of Two Inner-Product Designs
• Conclusions
![Page 116: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/116.jpg)
difference
pos./neg.
Choose Exponent
Normalize Mantissa and Adjust Exponent
Align Mantissas
Add/Subtract Mantissas
1.m1 1.m2e1 e2 s1 s2
Registers
exponent mantissa sign
Check for Absolute Zero and Infinity and Add Phantom Bit
Registers
Registers
Compare Exponents by Subtraction
Registers
Floating Point Adder
![Page 117: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/117.jpg)
Comparison of Integer and Floating Point Computations on FPGAs
(Highlights from Year 2)
• Integer Pipelined Multiplier
• Floating Point Pipelined Multiplier
• Floating Point Pipelined Adder
• Comparison of Two Inner-Product Designs
• Conclusions
![Page 118: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/118.jpg)
Inner Product Co-processor Designs
Input Buffer
Pipeline Multiplier
Pipeline Multiplier
Pipeline Adder
Output Buffer
Input Buffer
Pipeline Multiplier
Pipeline Adder
Output Buffer
Multiply-Accumulate SchemeMultiply-Add Scheme
![Page 119: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/119.jpg)
PerformanceSpeed # of # of # of # of Equivalent Estimated Power
Co-Processor Type (MHz) CLBs Flip-Flops 3-Input LUTs 4-Input LUTs Gate Count ConsumptionInteger Multiply-Accumalate 50 622 720 180 794 10076 N/AInteger Multiply-Add 43 1013 1148 423 1421 16809 415F.P. Multiply-Accumalate 38 437 414 154 742 8072 454F.P. Multiply-Add 34 716 654 254 1082 11766 390
( )
+++⋅⋅⋅++
= ∑∑∑∑
∈∈∈∈ − NN Sss
Sss
Sss
Sss ANANAA
21210
12221 Power Estimated
Notes:1. Integer co-processors implemented with 16-bit integer
multipliers and 32-bit integer adders2. The estimated power consumption calculated from
power simulator based on simplified (non-calibrated)constants:
![Page 120: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/120.jpg)
F.P. Multiply-Add vs F.P. Multiply-Accumulate Non-Weighted Activity Values
0
0.5
1
1.5
2
2.5
3
3.5
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Interconnection Length
Activ
ity V
alue Multiply-Add
Multiply-Accumulate
![Page 121: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/121.jpg)
0
10
20
30
40
50
60
70
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46
Interconnection Length
Wei
ghte
d Ac
tivity
Multiply-AddMultiply-Accumulate
F.P. Multiply-Add vs F.P. Multiply-Accumulate Linearly-Weighted Activity Values
![Page 122: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/122.jpg)
Comparison of Integer and Floating Point Computations on FPGAs
(Highlights from Year 2)
• Integer Pipelined Multiplier
• Floating Point Pipelined Multiplier
• Floating Point Pipelined Adder
• Comparison of Two Inner-Product Designs
• Conclusions
![Page 123: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/123.jpg)
Conclusions
• Developed libraries of efficient integer and floating point pipelined multipliers and adders
• Discovered that increasing the degree of pipelining increases required hardware
• Discovered that increasing the degree of pipelining generally increases maximum clock rate
• 16-bit F.P inner-product designs require less hardware than integer inner-product designs, which employ 16-bit multiplier(s) and 32-bit adder
• Multiply-accumulate designs consume more power (estimated) than multiply-add designs due to the requirement for long feedback paths
• Developed 50 page User’s Manual for Annapolis System
![Page 124: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/124.jpg)
Highlights from Year 2
• Efforts to Calibrate the FPGA Power Prediction Simulator
• Comparison of Integer and Floating Point Computations on FPGAs
• Architecture of Prototype System for SAR and STAP Processing
• Integration of Reconfigurable Computing into SAR
• Configuration Technique for STAP
![Page 125: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/125.jpg)
Data Source
VME
MercurySystem
CNCNPEPE... ...
SPARC
ReconfigurableSubsystem
DSP/GPPSubsystem
Data Sink
AnnapolisSystem 120 MB/sec
PC
120 MB/sec120 MB/sec
PC
PCI Custom Custom
PEPE...
ReconfigurableSubsystem
AnnapolisSystem
PCI
120 MB/sec
Architecture of Prototype System
![Page 126: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/126.jpg)
SAR Processing Flow
RangeCompression
AzimuthProcessing
DataTransfer
Azimuth
Range
![Page 127: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/127.jpg)
STAP Processing Flow
RangeCompression
DopplerFiltering
WeightComputation
DataTransfer
Doppler
Cha
nnel
Range
DataTransfer
![Page 128: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/128.jpg)
Refer to Poster for Physical Viewof Architecture
![Page 129: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/129.jpg)
Highlights from Year 2
• Efforts to Calibrate the FPGA Power Prediction Simulator
• Comparison of Integer and Floating Point Computations on FPGAs
• Architecture of Prototype System for SAR and STAP Processing
• Integration of Reconfigurable Computing into SAR
• Configuration Technique for STAP
![Page 130: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/130.jpg)
Integration of ReconfigurableComputing into SAR
(Highlights from Year 2)
• The SAR Benchmark
• Comparison of Two FIR Filter Designs
• Including FPGAs in the SAR Optimization Formulation
![Page 131: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/131.jpg)
The SAR Benchmark
• Retrieved Benchmark from
http://www.rl.af.mil/programs/hpcbench/
• Developed under the ARPT/Tri-Services Rapid Prototyping of Application Specific Signal Processors (RASSP) program
• Two main programs
• Synthetic SAR data generator (400 lines of code)
• Serial SAR processor (1600 lines of code)
• The SAR algorithm is stripmap mode - currently processes 4 frames of hh polarization data
![Page 132: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/132.jpg)
• The SAR Benchmark
• Comparison of Two FIR Filter Designs
• Including FPGAs in the SAR Optimization Formulation
Integration of ReconfigurableComputing into SAR
(Highlights from Year 2)
![Page 133: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/133.jpg)
Comparison of TwoFIR Filter Designs
D Q D Q
D Q
D QD Q
D Q
D Q
xk0 xk3xk2xk1
n
n++
+
Serial-Multiply/Parallel Add
• Ease of routing• Poor modularity
xk3 xk2 xk0xk1
+ +++
D Q
D QD Q D QD Q
D QD QD Q
n
n
Parallel-Multiply/Serial Add
• Poor routing• Good modularity
![Page 134: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/134.jpg)
Comparison of TwoFIR Filter Designs
• Both designs implemented using fixed-point complex data (16-bit fixed-point real and imaginary components)
• Both designs make use of constant coefficient multipliers (from core generator)
• Four tap serial-multiply/parallel-add filter fit onto one 4036xla part
• Three tap parallel-multiply/serial-add filter fit onto one 4036xla part (insufficient routing resources for four taps)
• Four tap parallel-multiply/serial-add filter implemented across two parts on one board (one 4036 and one 4013)
![Page 135: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/135.jpg)
• The SAR Benchmark
• Comparison of Two FIR Filter Designs
• Including FPGAs in the SAR Optimization Formulation
Integration of ReconfigurableComputing into SAR
(Highlights from Year 2)
![Page 136: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/136.jpg)
Including FPGAs in the SAR Optimization Formulation
• Power estimates must be determined for a range of kernel sizes for both filter designs
• Hybrid designs may exist for multi-chip implementations that yield desired features of both modularity and routability
• Binary optimization variable defines whether entry-FPGA or DSP/GPP subsystems perform range compression
• Real optimization variable defines fraction of azimuth processing divided among GPP/DSP and exit-FPGA subsystems
![Page 137: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/137.jpg)
Highlights from Year 2
• Efforts to Calibrate the FPGA Power Prediction Simulator
• Comparison of Integer and Floating Point Computations on FPGAs
• Architecture of Prototype System for SAR and STAP Processing
• Integration of Reconfigurable Computing into SAR
• Configuration Technique for STAP
![Page 138: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/138.jpg)
Configuration Technique for STAP
• Incorporate New Features into the Network Simulator
• Testing and Calibration of the Network Simulator
• Build and Execute RT_STAP Benchmark on Mercury RACE® Computer
• Optimization Problem• Computational Investigation
![Page 139: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/139.jpg)
NEW FEATURES FOR THE NETWORK SIMULATOR
• Incorporate Software Overhead Times in the Simulation Model– Currently, the simulator performs hardware switch-level modeling (i.e.,
packet level simulation at the crossbar level).– Modify the Network Simulator to include software overhead times for two
communication protocols.– Empirical analysis will be utilized to capture software overhead times for
the communication protocols.• Provide Additional Timing Information from Simulation Runs
– Currently, the simulator outputs completion times after each corner turn of the STAP data cube.
– Modify the Network Simulator to output message queue completion times for each Compute Node (CN) sending messages.
– Message queue completions times will become vital input into theoptimization algorithm.
• Add PowerPC Compute Node Configuration to the Simulator
![Page 140: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/140.jpg)
INCORPORATE SOFTWAREOVERHEAD TIMES
• Communication Time for a Message:
BM
TTT HardwareOSoftwareOC ++= )()(
CT
)(SoftwareOT
)(HardwareOTM
= Completion Time
= Software Overhead Time
= Hardware Overhead Time
= Message Size
= Network BandwidthB
where:
Modeled by SimulatorModeled by SimulatorInclude SoftwareInclude Software
Overhead Time in theOverhead Time in theSimulation ModelSimulation Model
![Page 141: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/141.jpg)
SOFTWARE PROTOCOLS
• Two Communication Protocol Times will be added to the SimulationModel
– DMA MC/OS Communication Times (DMA Transfers between CNs)– MPI (Message Passing Interface) Software Layer Communication Times
• Incorporating Software Overhead Times into the Simulation Model will be accomplished through Empirical Analysis.
– For each of the two software protocols, zero length messages will be sent through the network. Their resulting communication times will be measured.
– After analysis of multiple runs, the simulator will be calibrated to include both DMA transfer overhead and MPI software overhead.
![Page 142: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/142.jpg)
SOFTWARE COMPONENTS
MC/OS Runtime EnvironmentMC/OS Runtime EnvironmentMC/OS Runtime Environment
Interprocessor Communication System(ICS)
Interprocessor Interprocessor Communication Communication SystemSystem(ICS)(ICS)
POSIXAPI
POSIXPOSIXAPIAPI
MCexecMCexecMCexec
LoadableDevice Drivers
LoadableLoadableDevice Device DriversDrivers
DMAControllerDMADMAControllerController
CN ASIC Registers,InterruptsTimers,etc.
CN ASIC CN ASIC Registers,Registers,InterruptsInterruptsTimers,etc.Timers,etc.
MPI
Soft
war
e La
yer
MPI
Soft
war
e La
yer
MPI
Soft
war
e La
yer
‘DX’ Data Transfer‘DX’ Data TransferFacilityFacility
CPURegistersCPUCPURegistersRegisters
HARDWARE ABSTRACTION LAYER
Use
r Applic
atio
nU
ser
Applic
atio
nU
ser
Applic
atio
n
![Page 143: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/143.jpg)
PROPOSED WORK
• Incorporate New Features into the Network Simulator
• Testing and Calibration of the Network Simulator
• Build and Execute RT_STAP Benchmark on Mercury RACE® Computer
• Optimization Problem• Computational Investigation
![Page 144: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/144.jpg)
TESTING AND CALIBRATION OF THE NETWORK SIMULATOR
• Test Specific Communication Patterns to Verify Accuracy of the Network Simulator– Implement a Communication Task on the Mercury RACE®
Computer– Replicate the Communication Task on the Network Simulator– Compare the Resultant Completion Times– If Necessary, Fine-Tune the Network Simulator
• Two Types of Communication Patterns will be used to Test and Calibrate the Network Simulator– Simple Test Patterns (Hand-Calculated Verification) – Complex Test Patterns (Empirical Verification)
![Page 145: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/145.jpg)
TESTING AND CALIBRATION WITH SPECIFIC TEST PATTERNS
• Simple Test Patterns (Hand-Calculated Verification)– Implement simple test patterns between CNs to verify the accuracy and assist in
fine-tuning of the Network Simulator. The test pattern communication time can be hand-calculated for comparison to the simulated result.
• Single Source Message Tests• Two Source Message Tests (Non-Contending Paths)• Two Source Message Tests (Contending Paths)• N Source Message Tests (Non-Contending Paths)• N Source Message Tests (Contending Paths)
• Complex Test Patterns (Empirical Verification)– Implement more complex basic communication patterns to test the validity of the
simulator. Empirical data from the Mercury Computer implementing the same test pattern will be used to calibrate the Network Simulator.
• All-to-All Personalized Communication Test• Randomized Message Queue Communication Test
![Page 146: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/146.jpg)
SIMPLE TEST PATTERNSSingle Source Message Tests
• Test Plan Development Diagram
SingleMessageSingle
Message
TwoMessages
TwoMessages
3..N Messages
3..N Messages
SinglePacket /Message
SinglePacket /Message
TwoPackets /Message
TwoPackets /Message
3..PPackets /Message
3..PPackets /Message
SingleCrossbarSingle
Crossbar
3..CCrossbars
3..CCrossbarsSTARTSTART
RUN
TEST
RUN
TEST
![Page 147: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/147.jpg)
SIMPLE TEST PATTERNSTwo Source Message Tests
(*Non-Contending Paths)
• Test Plan Development Diagram (For Each Source)
SingleMessage /
CN
SingleMessage /
CN
TwoMessages /
CN
TwoMessages /
CN
3..N Messages /
CN
3..N Messages /
CN
SinglePacket /Message
SinglePacket /Message
TwoPackets /Message
TwoPackets /Message
3..PPackets /Message
3..PPackets /Message
SingleCrossbar
(Non-Contending)
SingleCrossbar
(Non-Contending)
3..CCrossbars
(Non-Contending)
3..CCrossbars
(Non-Contending)
STARTSTART
RUN
TEST
RUN
TEST
![Page 148: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/148.jpg)
SIMPLE TEST PATTERNSTwo Source Message Tests
(*Contending Paths)
• Test Plan Development Diagram (For Each Source)
SingleMessage /
CN
SingleMessage /
CN
TwoMessages /
CN
TwoMessages /
CN
3..N Messages /
CN
3..N Messages /
CN
SinglePacket /Message
SinglePacket /Message
TwoPackets /Message
TwoPackets /Message
3..PPackets /Message
3..PPackets /Message
SingleCrossbar(Contending)
SingleCrossbar(Contending)
3..CCrossbars(Contending)
3..CCrossbars(Contending)
STARTSTART
RUN
TEST
RUN
TEST
![Page 149: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/149.jpg)
Configuration Technique for STAP
• Incorporate New Features into the Network Simulator
• Testing and Calibration of the Network Simulator
• Build and Execute RT_STAP Benchmark on Mercury RACE® Computer
• Optimization Problem• Computational Investigation
![Page 150: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/150.jpg)
MERCURY RACE®COMPUTER CONFIGURATION
CrossbarCrossbarCrossbar
CrossbarCrossbarCrossbarCrossbarCrossbarCrossbar
CrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbarCrossbar
CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCNCN CNCN CNCN CNCN CNCN CNCN CNCN
VME PortVME Port
I/OI/O
CNCNCN
CNCNCN
CNCNCNPPC 603e, 16Mb, 100MhzPPC 603e, 16Mb, 100Mhz 3 SHARC 3 SHARC DSPsDSPs, 8Mb, 40Mhz, 8Mb, 40Mhz
3 SHARC 3 SHARC DSPsDSPs, 16Mb, 40Mhz, 16Mb, 40Mhz
![Page 151: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/151.jpg)
STAP IMPLEMENTATION ON MERCURY RACE® COMPUTER
• Implementation of STAP on the Mercury RACE® Computer involves the following tasks:
– Build the RT_STAP1 benchmark designed and developed by MITRE (requires MPI software).
– Successfully install and build MPI Software Technology, Inc.’s message passing interface software (MPI/PRO™) for the Mercury Computer (used by RT_STAP Benchmark).
– Build both the sequential host and parallel Mercury Computer version of the benchmark.• After successfully building and executing the RT_STAP benchmark on the 8 node
PowerPC Mercury RACE® computer, perform the following tasks:– Analysis of the RT-STAP benchmark source code to determined the partitioning of the
data (i.e., the mapping) and the scheduling of the messages. Replicate the data partitioning and message scheduling on the Network Simulator.
– Verify the reported communication times from the RT_STAP benchmark with the Network Simulator.
– Modify the RT-STAP source code to allow for specification of mapping and ordering of the data distribution. Verify results with the Network Simulator.
1 Cain, K.C., Torres, J.A., and Williams, R.T. MITRE Technical Report, MTR 96B0000021 RT_STAP: Real-Time Space-Time Adaptive Processing Benchmark. February 1997.
![Page 152: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/152.jpg)
MPI/PRO™ BUILD FORMERCURY RACE® COMPUTER
• MPI/PRO™ for RACE® is a Commercial Off-the-Shelf Standards-Based Message-Passing Middleware.
• Provides robust messaging and implements the Message Passing Interface (MPI) defined by the Message-Passing Forum.
• MPI/PRO™ supports MPI 1.2 extensions.
• Currently supports RACE® PowerPC and i860 CNs.
• MPI/PRO™ is layered on Mercury’s MC/OS development and runtime environment.
![Page 153: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/153.jpg)
RT_STAP BENCHMARK ON MERCURY RACE® COMPUTER
• The RT_STAP benchmark, developed by MITRE, was designed to evaluate the application of scalable, high performance computers to the real time implementation of STAP techniques.
• The benchmark has the capability to vary the sophistication and computational complexity of the adaptive algorithms employed.
• The goal is to build and execute the MITRE RT_STAP benchmarksoftware on an 8 node PPC 603e Mercury Computer (MCOS 4.4.2) using MPI Software Technology, Inc. MPI/PRO.
• The RT_STAP benchmark software employs a QR-decomposition algorithm component in the space-time adaptive processing. A QRD benchmark is also provided to characterize a single processors performance of QR-decompositions.
![Page 154: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/154.jpg)
Configuration Technique for STAP
• Incorporate New Features into the Network Simulator
• Testing and Calibration of the Network Simulator
• Build and Execute RT_STAP Benchmark on Mercury RACE® Computer
• Optimization Problem• Computational Investigation
![Page 155: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/155.jpg)
OPTIMIZATION PROBLEM
• Overview of the Approach
• Definition of a Class of Mappings for Data Partitioning
• Development of an Objective Function to Evaluate Defined Classes of Mappings
• Implementation of a Genetic Algorithm to Produce Schedules for the Top Mapping Candidates generated by the Mapping Objective Function. – Use the Simulator to Evaluate the Communication Performance.
![Page 156: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/156.jpg)
OVERVIEW OF THE APPROACH
STAP Data CubeSTAP Data Cube
Select # CNs (P)(P=Allocated Compute
Nodes)
Select # Select # CNs CNs (P)(P)(P=Allocated Compute (P=Allocated Compute
Nodes)Nodes)
Minimize Mapping(Use Objective Function)Minimize MappingMinimize Mapping(Use Objective Function)(Use Objective Function)
GeneticAlgorithm
(Determine Optimal Schedule)
GeneticGeneticAlgorithmAlgorithm
(Determine Optimal (Determine Optimal Schedule)Schedule)
Network Simulator(Estimate Overall
Communication Time)
Network SimulatorNetwork Simulator(Estimate Overall (Estimate Overall
Communication Time)Communication Time)
Select Fixed or Random MappingSelect Fixed or Select Fixed or
Random MappingRandom Mapping
OPTIMIZEOPTIMIZEOPTIMIZE
Mercury RACE®(Configured with 1..P CNs)
Mercury RACE®(Configured with 1..P CNs)
Adjust Allocated P
Adjust Adjust Allocated PAllocated P
![Page 157: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/157.jpg)
The mapping matrices could be defined by any one of the following:
• Possible values for M and N :
DEFINITION OF A CLASS OF MAPPINGS
FOR DATA PARTITIONING
111 : NMT ×
( ) { }PjijiNM =⋅∈ |),(,
222 : NMT ×333 : NMT ×
{ }3|),( Pjiji =⋅
• Let the matrix represent the mapping for the kth processing phase:
kT2-d Process Set
MM
NN
kT
kk NMP ⋅=• Equation for the number of CNs:
For Ex. Assume: 12=P
321 ,, TTT
{ })112(),26(),34(),43(),62(),121( ××××××
Assuming the CN assignments with a mapping matrix is raster ordered left to right, the total number of combinations is: 2166366 3 =⋅=
• Total number of combinations :
![Page 158: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/158.jpg)
OBJECTIVE FUNCTION DEVELOPMENTQuality of Mapping
• An objective function can be developed based on the definition of a class of mappings for data partitioning.
= { | CN i communicates with CN j }
1T
2T
CornerCorner--Turn Produces Messages Turn Produces Messages
∑∈
⋅1),(
minεji
ijij dmObjective:
ijmijm
ijd
= message from CN i to CN j
= message size of ijm
Using the following definitions:
= minimum number of required crossbar connections for message ijm
1T = such that each represents the CN where the data vector is distributed.
[ ]crT ,111 NM ×
2T = such that each represents the CN where the data vector is distributed.
[ ]crT ,222 NM ×
ε ),( ji
3T = such that each represents the CN where the data vector is distributed.
[ ]crT ,333 NM ×
2T
3T
CornerCorner--Turn Produces Messages Turn Produces Messages
∑∈
⋅2),(
minεji
ijij dmObjective:
![Page 159: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/159.jpg)
OBJECTIVE FUNCTION DEVELOPMENTQuality of Mapping
• An objective function for the communication time:
• An objective function for STAP processing:
⋅+
⋅ ∑∑
∈∈ 21 ),(2
),(1 minmin
εε jiijij
jiijij dmkdmk
⋅+
⋅ ∑∑
∈∈ 2),(2
),(1 minmin
1 εε jiijij
jiijij dmkdmk
4k+ 5k+
3k+ (Range Computation Time)
(Doppler Computation Time) (Weight Computation Time)
First Corner Turn Second Corner Turn
![Page 160: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/160.jpg)
GENETIC ALGORITHMS
• A genetic algorithm (GA) is a population-based model that uses selection and recombination operators to generate new sample points in a search space.
• A GA encodes a potential solution to a specific problem on a chromosome-like data structure and applies recombination operators to these structures so as to preserve critical information.
• Often, GAs are viewed as function optimizers. As a result, researchers are typically interested in GAs as optimization tools.
• Implementation of a GA begins with a population of chromosomes. Once each chromosome is evaluated, reproduction opportunities are applied in such a way that those chromosomes which represent a better solution to the target problem are given more chances to reproduce than chromosomes with poorer solutions.
• Currently, GAs are a promising heuristic approach to locating near-optimal solutions in large search spaces.
![Page 161: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/161.jpg)
GENETIC ALGORITHMS
• A genetic algorithm is typically composed of two main components that are problem dependent:
– The problem encoding• The first component involves generating an encoding scheme to represent possible
solutions to the optimization problem. Candidate solutions are usually represented as strings of fixed length, like chromosomes, usually coded with a binary character set.
– The evaluation function• An evaluation function measures the quality of a particular solution. In this
research, the evaluation of a particular candidate will be accomplished by the Network Simulator. The fitness of the candidate from the population space will be measured based on its simulated performance.
• The objective of a GA search is to locate the chromosome that has the optimal fitness value. For this research, if the chromosome represented the scheduling of messages and the fitness value the completion time of the schedule, the objective of the GA would be to find the smallest value (i.e., shortest completion time).
![Page 162: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/162.jpg)
IMPLEMENTATION OF A GENETIC ALGORITHM HEURISTIC
• Implementation of a GA involves the following steps:1
– Generate an initial populationThis initial population is the first generation where evolution starts. A random set of chromosomes is often used as the initial population
– An evaluation using the evaluation or fitness functionEvaluate the quality of each chromosome in the initial population.
– A selection mechanismIn this step, chromosomes are duplicated or eliminated based on its relative quality or fitness. The population size is kept constant.
– A crossover mechanismSome pairs of the chromosomes are selected from the current population, and some of their corresponding components are exchanged to form two valid chromosomes. The new chromosomes may or may not be in the current population.
1 Wang, L., Siegel, H.J., Roychowdhury, V.P., and Maciejewski, A.A. Task Matching and Scheduling in Heterogeneous Computing Environments using a Genetic Algorithm-Based Approach, Journal of Parallel and Distributed Computing Special Issue on Parallel Evolutionary Computing.
![Page 163: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/163.jpg)
IMPLEMENTATION OF A GENETIC ALGORITHM HEURISTIC
• Implementation of a GA involves the following steps:1
– A mutation mechanismAfter a crossover operation, each string in the population may be mutated with some probability. The mutation process transforms a chromosome into another valid one that may or may not be in the population. The motivation for using mutation is to prevent the algorithm from getting stuck in a local minimum.
– Reevaluation of the populationThe new population after selection, crossover, and mutation is reevaluated. The fitness value for each new chromosome is computed.
– A set of stopping criteriaThe stopping criteria specifies the criteria upon which the algorithm terminates. If the stopping criteria have not been met, the new population goes through another cycle of selection, crossover, mutation, and evaluation. This cycle repeats until one of the stopping criteria is met.
1 Wang, L., Siegel, H.J., Roychowdhury, V.P., and Maciejewski, A.A. Task Matching and Scheduling in Heterogeneous Computing Environments using a Genetic Algorithm-Based Approach, Journal of Parallel and Distributed Computing Special Issue on Parallel Evolutionary Computing.
![Page 164: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/164.jpg)
Configuration Technique for STAP
• Incorporate New Features into the Network Simulator
• Testing and Calibration of the Network Simulator
• Build and Execute RT_STAP Benchmark on Mercury RACE® Computer
• Optimization Problem• Computational Investigation
![Page 165: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/165.jpg)
COMPUTATIONAL INVESTIGATION
• A QR-D computation is deterministic (i.e, its complexity can be calculated).
• A Conjugate Gradient (CG) computation is notDeterministic. Its complexity depends on the initial condition and desired tolerance.– This work proposes the investigation of the impact of
“intelligent” initial condition values to a CG algorithm.
![Page 166: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/166.jpg)
CONJUGATE GRADIANT APPROACHInvestigation of Initial Condition Values
A B C D
swCBA
=11 ),,(ψ sw
DCB=22 ),,(
ψ
HxxCBA 111 ),,(
⋅=ψ
=
CBA
x 1
=
DCB
x 2Hxx
DCB 222 ),,(⋅=ψ
Solve the following equations:Solve the following equations:
Where:Where:
,,
,,
=s
=1w weight vectorweight vector
steering vectorsteering vector
![Page 167: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/167.jpg)
CONJUGATE GRADIANT APPROACHInvestigation of Initial Condition Values
[ ]
=
=⋅=
HHH
HHH
HHH
HHHH
CCCBCABCBBBAACABAA
CBACBA
xxCBA 111 ),,(
ψ
[ ]
=
=⋅=
HHH
HHH
HHH
HHHH
DDDCDBCDCCCBBDBCBB
DCBDCB
xxDCB 222 ),,(
ψ
• Expanding and yields the following:),,(1 CBA
ψ),,(2 DCB
ψ
![Page 168: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/168.jpg)
• Attempting to solve the following equation for :
• Attempting to solve the following equation for :
CONJUGATE GRADIANT APPROACHInvestigation of Initial Condition Values
swCBA
=11 ),,(ψ1w
=
3
2
1
3,1
2,1
1,1
1 ),,(
sss
www
CBAψ
=
3
2
1
3,2
2,2
1,2
2 ),,(
sss
www
DCBψ
13,12,11,1 swACwABwAA HHH =++
23,12,11,1 swBCwBBwBA HHH =++
33,12,11,1 swCCwCBwCA HHH =++
13,22,21,2 swBDwBCwBB HHH =++
23,22,21,2 swCDwCCwCB HHH =++
33,22,21,2 swDDwDCwDB HHH =++
2w swDCB
=22 ),,(ψ
Set of Linear EquationsSet of Linear Equations
Set of Linear EquationsSet of Linear Equations
![Page 169: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/169.jpg)
• Investigation of the two sets of linear equations reveals similarities among the sets of equations:
• The similarities between the equations may provide insight into the selection of the initial condition values. Assuming the steering vector remains the same for each set of linear equations, the initial values could be assigned as follows:
– If range bin D is similar to range bin C, then
– If range bin D is similar to range bin A, then
CONJUGATE GRADIANT APPROACHInvestigation of Initial Condition Values
13,12,11,1 swACwABwAA HHH =++
23,12,11,1 swBCwBBwBA HHH =++
33,12,11,1 swCCwCBwCA HHH =++
13,22,21,2 swBDwBCwBB HHH =++
23,22,21,2 swCDwCCwCB HHH =++
33,22,21,2 swDDwDCwDB HHH =++
2,11,2 ww ← 3,12,2 ww ← 3,13,2 ww ←
2,11,2 ww ← 3,12,2 ww ← 1,13,2 ww ←
![Page 170: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/170.jpg)
• Program Overview and Introduction (Quad Chart)
• Program Management Status
• Highlights from Year 1
• Highlights from Year 2
• Work to be Completed
Outline
![Page 171: Optimal Configuration of Combined GPP/DSP/FPGA …antonio/pubs/p-ann_rev99acs.pdf · Optimal Configuration of Combined GPP/DSP/FPGA Systems for Minimal SWAP Presented by ... Pulse](https://reader031.vdocuments.net/reader031/viewer/2022022008/5ae572257f8b9a8b2b8bdc30/html5/thumbnails/171.jpg)
Work to be Completed
• Interfacing of FPGA and GPP/DSP Subsystems
• Implement Parallel SAR Algorithm on GPP/DSP System
• Integrate FPGA FIR Filters for Range and Azimuth Processing for SAR
• Implement Parallel STAP Algorithm for GPP/DSP System
• Integrate FPGA FIR Filters for Range Processing for STAP
• Implement FPGA-based Linear Equation Solver
• Integrate FPGA-based Linear Equation Solver with STAP