0839-118-1 mit lincoln laboratory radar pulse compression using the nvidia cuda sdk stephen bash,...

0839-118-1

MIT Lincoln Laboratory

Radar Pulse Compression Using the NVIDIA CUDA SDK

Stephen Bash, David Carpman, and David Holl

HPEC 2008

September 23-25, 2008

This work is sponsored by the Air Force Research Laboratory under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and not necessarily endorsed by the United States Government.

MIT Lincoln Laboratory0839-118-2

NVIDIA Compute Unified Device Architecture SDK

• Create custom kernels that run on GPU

• Extension of C language

• Provides driver- and runtime-level APIs

• Includes numerical libraries– CUFFT– CUBLAS

• $/GFLOP GPU=$1.27 CPU=$29.18

Benchmarking the NVIDIA 8800GTX with the CUDA Development Platform

Michael McGraw-Herdeg, MIT

Douglas P. Enright, The Aerospace Corporation

B. Scott Michel, The Aerospace Corporation

©2007 The Aerospace Corporation

HPEC 07:


NVIDIA Compute Unified Device Architecture SDK

• Create custom kernels that run on GPU

• Extension of C language

• Provides driver- and runtime-level APIs

• Includes numerical libraries– CUFFT– CUBLAS

• $/GFLOP GPU=$1.27 CPU=$29.18

Benchmarking the NVIDIA 8800GTX with the CUDA Development Platform

Michael McGraw-Herdeg, MIT

Douglas P. Enright, The Aerospace Corporation

B. Scott Michel, The Aerospace Corporation

©2007 The Aerospace Corporation

HPEC 07:


Radar Pulse Compression

• Waveform design and processing to achieve higher range resolution and sensitivity*

• Processing consists of convolution with FIR filter

– Doppler tolerant (top): traditional frequency domain convolution

– Doppler intolerant (bottom): additional FFT and Doppler correction required

* Skolnik, Radar Handbook, Second Edition. McGraw Hill Publishing, Boston, MA, 1990.

Replica

Replica

Fast Time IFFT

Fast Time FFT

Fast Time FFT

Slow Time FFT

Fast Time IFFT

Fast Time IFFT

Doppler Correction


GPU vs. CPU Comparison

• CPU vs GPU comparison in real-world conditions

– 2 GHz dual quad-core AMD Opterons vs eVGA eGeForce 8800 Ultra

– Memory transfer to and from GPU included in timing

0

10

20

30

40

50

60

Processing Time Per Stage500

480

490CPUGPU

Sta

ge

Tim

e (

ms

)

Batch Size

1D FFT

1 4 16 64 256

1024

2048

4096

8192

16384

32768

65536

1000

1960

4725

10368

27000

FF

T S

ize

3

2

1

0.5

0.3

GP

U S

pe

ed

up

Extract Range Region

Fast Time IFFT

Multiply Replica 3


Fast Time IFFT

Multiply Replica 2


Doppler Correction

Doppler FFT

Doppler Window

Fast Time IFFT

Multiply Replica 1

Fast Time FFT


Backups


Reference: $/GFLOP

As of July 2007, these products represent the top of the line consumer CPU and graphics card according to floating point computational power:

1. Kentsfield Core 2 Extreme QX680037.7 GFLOPS – fastest CPU as of 7/16/2007

http://www.tomshardware.com/2007/07/16/cpu_charts_2007/page36.html

$1100 – price as of March 10, 2008 http://www.google.com/products?q=Kentsfield+Core+2+Extreme+QX6800

$/GFLOPS = $29.18Notes: Price excludes motherboard + power supply + memory + GPU

2. EVGA GeForce 8800 Ultra Superclocked (NVIDIA)576 GFLOPS – theoretical peak

http://en.wikipedia.org/wiki/GeForce_8_Series

$730 – price as of March 10, 2008

http://www.google.com/products?q=768-P2-N887-AR&scoring=p

$/GFLOPS = $1.27Notes: Price includes 768 MB GDDR3 memory, but excludes: motherboard + power supply + CPU

0839-118-1 mit lincoln laboratory radar pulse compression using the nvidia cuda sdk stephen bash,...

Documents