development of an fpga-based two transform pulse compressor lperform a two-transform pulse...
TRANSCRIPT
Development of an FPGA-Based Two Transform Pulse Compressor
Perform a Two-Transform Pulse Compression using a Received reflected signal and a Reference signal
Input signals are first phase corrected using a complex phase factor multiply
Range Compression is achieved by a cross-correlation of the Received signal with the Reference signal which is implemented as mulktiplication of the Received signal by the conjugate of the Reference in the frequency domain
Both input signals are first transformed to the frequency domain using Fast Fourier Transforms (FFTs)
Provisions for a frequency domain correction are included as a complex multiply after the cross-correlation
Following cross-correlation and error correction, an Inverse Fast Fourier Transform (IFFT) is used to obtain the time domain compressed signal
An optional swath selection is used to select a desired portion of the output compressed signal
Create a high-throughput Two Transform Pulse Compressor for use in wideband real-time Radar Signal Processor applications using Commercial Off-The-Shelf (COTS) Field Programmable Gate Array (FPGA) processor boards.
GOALSGOALS
2 Virtex™ II FPGA Processing Elements XC2V6000 or X2V8000
0 to 48 MBytes of Synchronous ZBT SRAM in 6 Memory Banks 0 to 256 MBytes of Synchronous DRAM in 1 Memory Banks PCI Bus - Rev 2.2 Compliant
5V Board - 32/64 Bit, 33 MHz, 5V or 3.3V Slot 3.3V Board - 32/64 Bit, 33/66MHz, 3.3V Slot Automatic 32/64 Bit PCI Bus Recognition
Host Software: NT 4.0 and 2000, Linux, Solaris API and Device Drivers
VHDL Model of the System for Easy Development Accepts COTS High speed WILDSTAR™ I/O Cards
WILDSTAR™ Data Port (WSDP™), FPDP, Myrinet™, 65 MHz A/D, and 1 GHz A/D
12 to 16 Million System Gates Virtex™ E FPGA is larger, faster, and uses less power than Virtex™ FPGA 150 MHz Board, FPGA and Memory Speed 4.8 GBytes/Sec Memory Band Width I/O Band Width
66 MHz PCI - Up to Theoretical Maximum of 512 MBytes/Sec with 64 Bits WILDSTAR™ PE to I/O Board - 3 GBytes/Sec LAD Bus - 256 MBytes/Sec at 66 MHz/32 Bits
Supports Internet Reconfiguration Program from Flash on Power Up Commercial Off the Shelf Product (COTS)
Features Benefits
CONCEPTSCONCEPTS
PERFORMANCEPERFORMANCE
SYSTEM COMPONENTSSYSTEM COMPONENTS
DEMONSTRATION HARDWAREDEMONSTRATION HARDWARE
WILDSTAR II™ PCI BOARDWILDSTAR II™ PCI BOARD
DESIGN ANALYSISDESIGN ANALYSIS
Integrated Sensors, Inc. (315)798-1377 www.sensors.com
Two-Transform Pulse Compressor Algorithm
Ch11D
FFT
1DFFT
Conj *Conj *
1DIFFT
1DIFFT
RngSelect
RngSelect
SwathSelection
ErrorCompensation
(MC)
ErrorCompensation
(MC)
Ref1D
FFT
1DFFT
e j(k)
Phase MultiplyCorrection (MC)
e j(k)
Phase Multiply Correction (MC)
PE 1
VIRTEX TM II
XC2V 6000, 8000
PCI
PCI BUS
WILDSTAR TM II PCI
64 Bits 66/133 MHz
172PE 0
VIRTEX TM II
XC2V 6000, 8000
DDR
SDRAM
64 MB
I/O
#0
168
172
32 32
DDR2
SRAM
4
MB
36
32 32
I/O
#1
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
36 36
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
36 36 36
DDR2
SRAM
4
MB
36
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
36 36
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
DDR2
SRAM
4
MB
36 36 36
DDR
SDRAM
64 MB
168
32 32
Copyright 2002
Annapolis Micro Systems, Inc.
Prog
Osc
3Prog
Osc
3
Flash
Flash
16
Flash
16
Master
Clock
Generator
PCLK
MCLKICLK
Differential
Single Ended16
WSD
P / F
PGA
WSD
P / F
PGA
PRE-PROC
SelectIFFTEC
XmpyFFT
Ch1Ref
Ch1b Refb
Pulse #1
Pulse #2
2 pulses64K ea
250 Msps
Collectedresults: 4
processedpulses
Ch1a Refa
4 pulses64K ea
500 Msps
2 pulses64K ea
250 Msps
1 pulse64K
125 Msps
1 pulse64K
125 Msps
1 pulse64K
125 Msps
1 pulse64K
125 Msps
1 pulse64K
125 Msps
FFT
WILDSTARBoard #1
WILDSTARBoard #2
PE0 PE1
WSDP0 WSDP1
PE0 PE1
WSDP0 WSDP1
Router/Interface
PRE-PROC
PRE-PROC
PRE-PROC
PRE-PROC
PRE-PROC
PRE-PROC
PRE-PROC
FFT FFT FFT
SelectIFFTEC
XmpyFFT
SelectIFFTEC
XmpyFFT
SelectIFFTEC
XmpyFFT
2 Wildstar II BoardsProcess 4 Simultaneous
Pulses
Fixed Point complex FFT core Approximately 5:1 size reduction over Floating Point core Multiply/accumulators not driving factor in size Can fit ~4 x 8-bit FFT cores in a single V6000 FPGA
• 4:1 hardware improvement over Floating Point• 64K vector length; 8 bit input; 18 bit max bit width
4 FFT points/clock + latency• 64K complex FFT @ 150 MHz 109us• 32K complex FFT @ 150 MHz 55us
Floating Point vs Fixed Point Sizing Fairly consistent 5:1 ratio Observed with FFT, complex mpy and add, divide, sqrt cores
Fixed Point Cores offer ~5:1 size advantage over Floating Point cores
4 6 8 10 12 14 16 18 20 22-10
0
10
20
30
40
50
60
70Fixed Point FFT SQNRs vs Bit Width & FFT Length (Mode 1)
FFT Input/Max Bit Width
SQ
NR
dB
16K32K64K
Signal to Quantization Noise Ratio (SQNR)
Analyzed using MATLAB FFT models using specified bit widths and truncations
Signal to Quantization Noise Ratio
Uses uniform distributed noise input to FFT
S|Xfloat|2 S|Xfloat-Xfixed|2
3 dB difference for each doubling of FFT length (1/2 bit)
Bit growth through FFT added
Bit growth from 8/10 bit inputs appears to give reasonable
SQNRs
3.268 3.27 3.272 3.274 3.276 3.278 3.28
x 104
-50
-45
-40
-35
-30
-25
-20
-15
-10
-5
0
Xcorr XmitSig IPR Comparison (64K, Mode2)
Cell
Xco
rr S
igna
l2 P
ow
er 1
0lo
g10
dB
20 bits18 bits16 bits14 bits12 bits10 bitsfloating pt
ISLR loss alone can be deceiving metric, need to consider factors such as IPR shape, which can show severe truncations with
apparently good ISLR
PULSE COMPRESSOR ARCHITECTUREPULSE COMPRESSOR ARCHITECTURE
FPGA Processors Offer high throughput, much higher density
than DSP processors Reconfigurable processing COTS solution Low cost and much faster alternative to ASICs
Implemented in Annapolis Wildstar COTS Boards Powerful core design tools and libraries
available for fast development and prototyping Includes high speed WSDP data interconnects
FPGAs offer growth path to improved processors 50 million gate parts Platform FPGAs including PPC processors, I/O
and RAM are currently available
XilinxV8000FPGA
8M gates Part integration (4X) Improved PPC speed (2X)
XilinxV6000FPGA
6M gates
PowerPCProcessor
Power PC
Power PC
FPGALogic
RAM
High Speed I/O
“Platform” FPGA
Xilinx FPGA
50M gates
2007Technology
V8000 PartsCurrently Available
V6000 Parts:ECP Demo
Xilinx2VP50FPGA
4 PPCs
Currently Available
WILDSTAR IITM ARCHITECTUREWILDSTAR IITM ARCHITECTURE
RA
M B
uff
ers
RA
M B
uff
ers
Range CompressionProcessor (~6 boards)
REF
Ch1
Ch2
Ch3
A/D
A/D
A/D
A/D
Rou
ter/
Tim
e A
lign
/In
terf
ace
Rou
ter/
Tim
e A
lign
/In
terf
ace
Interface(Custom)
SYSTEM ARCHITECTURESYSTEM ARCHITECTURE
10 15 20 25 30
-2.5
-2
-1.5
-1
-0.5
0
XmitSig ISLR Loss vs FFT Bit Width (64K, Mode2)
Mpy Bit Width
Xco
rr I
SL
R L
oss
, dB
3 Wildstar II Virtex V6000 Board Nodes IBM PC servers to host boards (6) 6 Million gate parts
Status: Operational 3 Wildstar II assemblies complete and
operating 1 Data driver and collector board 2 Processing boards
FFT Maximum bit widths of 18 bits appear to give less than 0.1 dB of ISLR loss, corresponds to space efficient fixed point FPGA FFT core implementations using Xilinx parts with embedded 18x18 multipliers
Router / Time Alignment / Interface (Custom) Board required
Signal Processor requires WSDP data input interfaces due to high data rates
Time align Ref and Ch1, Ch2, Ch3... channels Buffer and rate reduce each channel into lower rate channels
for WSDP capabilities (800 MB/sec) Provide WSDP compatible output interfaces
FPGA Node configuration Processing node of 2 FPGA
boards performs complete Range Compression on 2 range pulses using combination of V2000E Xilinx FPGAs on the WSDP I/O cards and V6000 FPGAs on the base cards
Pass through concept: each iteration node strips off 1st 64k pulse samples to process, passes remaining pulse data onto succeeding iteration nodes
FPGA IMPLEMENTATION via COREFIRETMFPGA IMPLEMENTATION via COREFIRETM
SYSTEM DESIGN GOALSSYSTEM DESIGN GOALS Perform Pulse Compressions on input data in real time
Up to 64K (16K, 32K, 64K) sample input pulses Up to 500 MHz data sample rate Data samples are complex, up to 8 bits per sample
CorefireTM
Annapolis Microsystems design tool
Allows fast development of FPGA core designs using libraries of functional blocks
Interface board design illustrated Receives data pulses from
upstream splitter Performs FFT pre-processing
FFT designs analyzed using metrics
Bit widths and growth specified in models
Performance considered using synthetic bandlimited pulse
Sidelobe Ratio Energy in peak compared to
energy in sidelobes Degradation in pulse
compression will manifest itself with higher sidelobe levels
Impulse Response Shape and quantization effects
considered for compressed pulses
3.265 3.27 3.275 3.28 3.285 3.29
x 104
30
35
40
45
50
55
60
65
Rngcomp MATLAB vs Wildstar (scaling: 1051.1)
dB
MATLAB: blue Wildstar: red
Multiple pulse design currently running 4 simultaneous pulses processed on 2 Wildstar II
boards
FFT Throughput Performance One point per clock Clock currently running at 81Mhz on speed grade -
4 parts Anticipate speeds up to 133 MHz on speed grade -
6 parts
Single Wildstar II Board provides up 16 Million FPGA gates and 4.8 Gbytes/sec I/O on WSDP ports
Four node parallel processor implemented and operating;
investigation continues for faster operating clock and larger parts for
increased bit widths and data precision
Three input channel architecture illustrated
FPGA growth path includes increased gate density and increased features for smaller designs with
improved precision and capabilities