reconfigurable computing: current status and potential for spacecraft computing systems

32
Barto 127-MAPLD2005 Slide 1 Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems Rod Barto NASA/GSFC Office of Logic Design Spacecraft Digital Electronics 3312 Moonlight El Paso, Texas 79904

Upload: borna

Post on 20-Mar-2016

68 views

Category:

Documents


2 download

DESCRIPTION

Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems. Rod Barto NASA/GSFC Office of Logic Design Spacecraft Digital Electronics 3312 Moonlight El Paso, Texas 79904. Reconfigurable Computing is…. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 1

Reconfigurable Computing:Current Status and Potential

for Spacecraft Computing Systems

Rod BartoNASA/GSFC Office of Logic Design

Spacecraft Digital Electronics3312 Moonlight

El Paso, Texas 79904

Page 2: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 2

Reconfigurable Computing is…• A design methodology by which

computational components can be arranged in several ways to perform various computing tasks

• Two types of reconfigurable computing:– Static, i.e., the computing system is

configured before launch– Dynamic, i.e., the computing system can be

reconfigured after launch

Page 3: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 3

Static Reconfigurability

• Several examples exist, e.g., Cray• Typically processing modules connected

by an intercommunication mechanism, e.g., Ethernet

• Goals are– To reduce system development costs– To provide higher performance computing

Page 4: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 4

Dynamic Reconfigurability (DR)• Processing modules that can be

reconfigured in flight• Goal is to provide processing support for

algorithms that do not map well onto general purpose computers using reduced amounts of hardware

Page 5: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 5

Outline of Paper1. Discuss the computation of a series of algorithms on

general purpose, special purpose, and DR computers2. Calculate the execution time of an image processing

algorithm on a concept DR computer3. Compare the reconfiguration time of a Xilinx FPGA

with the algorithm execution time calculated in section 2.

4. Obtain an extremely rough estimate of image processing algorithm execution time on a flight computer

5. Conclude that the DR computer described offers higher performance than does the flight computer

Page 6: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 6

Section 1:Algorithm Execution on General Purpose (GP),

Special Purpose (SP), and DR Computers

Page 7: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 7

Processing example

• A computing function is the composition of n algorithms executed serially

• Can be executed on a general purpose computer (GP) or a special purpose computer (SP)

Outputf1 f2 fnInput

Page 8: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 8

Execution on a GP Computer

n

i

ti1

Outputf1 f2 fnInput

Processing time of each stage = ti, i=1..n

Total processing time =

Latency time =

n

i

ti1

GP computer must execute processing stages sequentially, and cannot exploit parallelism in overall computing function

Page 9: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 9

Processing on an SP ProcessorOutputf1 f2 fnInput

Each stage is an independently operating processor designed specifically for the algorithm it executes

Processing time of each stage = ti, i=1..n

Results appear at rate of one per max(ti), 1=1..n

Latency time = n

i

ti1

Performance increase comes from two factors:

• Pipelining of constituent algorithms exploiting parallelism

• Processors being designed specifically for their algorithms

Page 10: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 10

Processing on a DR Computer

• Two processing elements alternately process and reconfigure, i.e., fodd executes one algorithm while feven reconfigures for the next algorithm, etc.

fodd

feven

InputOutput

Page 11: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 11

DR Computer Processing Flowfodd

feven

f1

R

R

f2

f3

R

R

f4

fn

R

time

n

i

Rti1

),max(

n

i

Rti1

),max(

R = reconfiguration

Results appear at rate of one per

Latency =

Performance increase comes from configuring processors specifically for the algorithm they are executing

Do not get increase from exploiting parallelism.

Page 12: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 12

Section 2:Execution Time of an Image Processing Algorithm on a

Concept DR Computer

Page 13: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 13

DR Computer Concept

• RAM0 is source for FPGA0, destination for FPGFA1, etc.• Processing elements are implemented in FPGAs• FPGA0 and FPGA1 alternately process and reconfigure, as previously

discussed.• Input and output not shown

FPGA0

FPGA1

RAM1RAM0

Page 14: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 14

AlgorithmExample: 3x3 Image Convolution

• Shifting in 1 row at a time pixel-serial, and parallel shifting into the upper 3 row registers, the rows are shifted around through the convolution processor. All the row registers and processing is inside the FPGA. The results are written to the destination RAM after a latency of 3 row reads.

Image width in pixels

row i-1row irow i+1

Parallel shift rows uprow i+2

Circular shift rows through convolution processor

3x3 convolution processor

Destination RAM

Source

RAM

one pixel

Page 15: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 15

Convolution Operationi-1j-1

i-1j

i-1j+1

ij-1

row Icolumn j

ij+1

i+1j-1

i+1j

i+1j+1

Used, for example, to compute the intensity gradient (derivative) at pixel (i,j)

Result = P(i-1,j-1)*m11+P(i-1,j)*m12+P(i-1,j-1)*m13+…+P(i+1,j+1)*m33

m11 m12 m13

m21 m22 m23

m31 m32 m33

Pixel array Convolution mask

Page 16: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 16

Convolution Calculation

• Arithmetic processing may require some pipelining

*P(i-1,j-1)

m11

*P(i-1,j)

m12

*P(i-1,j+1)

m13

*P(i,j-1)

m21

*P(i,j)

m22

*P(i,j+1)

m23

*P(i+1,j-1)

m31

*P(i+1,j)

m32

*P(i+1,j+1)

m33 Result(I,j)

Page 17: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 17

Convolution Timing• Total time = latency+processing = 20.971 msec

– This assumes we can get pixels into the FPGA at a 20 nsec/pixel rate

– Latency = time to read 3 rows: • 1024 pixels *3 rows * 20 nsec/pixel = 61 usec

– Processing = time to stream remaining 1021 rows through and process:

• 1024 * 1021 * 20 nsec = 20.910 msec• Larger convolutions (e.g., 7x7) have longer

latencies, but same computation time• Calculation is for a mono image, stereo image

would take twice as long.

Page 18: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 18

Section 3:Comparing the Reconfiguration Time of a Xilinx FPGA With the

Algorithm Execution Time Calculated in Section 2.

Page 19: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 19

DR Computer Processing Element:

Virtex-4 LX FPGA• Eight versions:

– XC4VLX15, -25, -40, -60, -80, -100, -160, -200• Logic hierarchically arranged:

– 2 flip-flops per slice– 4 slices per CLB

Device CLBs FFs-15 64x24 12,288

-200 192x116 178,176

Page 20: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 20

Time to Configure FPGA

• FPGA Configuration SequencePROG_B

INIT_B

CCLK

DONE

Tpl Tconfig

Total Configuration Time

Page 21: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 21

Configuration Timing: Tpl

• Tpl = 0.5 usec/frame• “frame” is a unit of configuration RAM• Tpl period clears configuration RAM

Device Frames Tpl

-15 3740 1.87 msec

-200 40108 20.1 msec

Page 22: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 22

Configuration Timing: Tconfig• FPGA programmed by bitstream• CCLK (programming CLK) can run at 100 MHz• Parallel mode loads 8 bits per CCLK

Device Bitstream length

Parallel loads

Tconfig

-15 4,765,138 595,648 5.956 msec

-200 48,722,432 6,090,304 60.903 msec

Page 23: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 23

Total Configuration Time

• Plus some extra time amounting to a few CCLK cycles (@ 10 nsec each)

Device Tpl, msec

Tconfig, msec

Total Configration time, msec

-15 1.87 5.956 7.826

-200 20.054 60.903 80.957

Page 24: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 24

Processing and Reconfiguration Time Comparison

• Convolution execution is faster than reconfiguration – Convolution = 21 msec mono, 42 msec stereo– Reconfiguration = 81 msec– Assuming -200 device

• Processing shown is well within FPGA’s capabilities• More complex algorithms may require use of FPGA

performance features– Much higher internal clock rates– Large internal RAM– Dedicated arithmetic support in –SX series

• What this shows is that it’s reasonable to consider alternating execution and reconfiguration of two FPGAs

Page 25: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 25

Section 4: An Extremely Rough Estimate of

Image Processing Algorithm Execution Time on a Flight

Computer

ROUGH ESTIMATE

Page 26: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 26

GP Computing Performance Estimate

• DANGER: really rough estimate!• Based on data from this paper:

– “Stereo Vision and Rover Navigation Software for Planetary Exploration”, Steven B. Goldberg, Indelible Systems; Mark Maimone, Larry Matthies, JPL; 2002 IEEE Aerospace Conference

– Available at robotics.jpl.nasa.gov/people/mwm/visnavsw/aero.pdf

– Describes processing and algorithms to be used on 2004 Rover missions, and Rover requirements.

ROUGH ESTIMATE

Page 27: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 27

Published Vision Algorithm Timing• Timed on Pentium III 700 MHz CPU, 32K L1 cache,

256K L2 cache, 512M RAM, Win2K• algorithms explicitly timed (names from paper):

Algorithm C code VectorDifference of Gaussian and Decimate

4229 msec 2047 msec

Prepare Next Row 5559 msec 1163 msecInner Loop 4360 msec 2928 msecCompute Sub-Pixel 667 msec 871 msec

• The Gaussian and most vision algorithms involve neighborhood operations that are comparable to an image convolution of some size

ROUGH ESTIMATE

Page 28: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 28

Flight Computer Performance• Flight processor is RAD6000• GESTALT Navigation algorithm timed on 3 processors:

Processor Execution Time Multiplier

Pentium III, 500 MHz, Linux 1

Sparc 300 MHz, Solaris 3.0-3.5

RAD6000, 20 MHz, VxWorks 7.7-8.7

Assume that the RAD6000 takes 7 times as long as the 500 MHz Pentium

ROUGH ESTIMATE

Page 29: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 29

Final Peformance Estimate• Assume RAD6000 time = 7 times the 500 MHz Pentium time• Assume 500 MHz Pentium time = 7/5=1.4 times the 700 MHz Pentium

time• Then, RAD6000 time is 1.4*7=9.8 times the 700 MHz Pentium time• Vision algorithm timing can be estimated as follows:

Algorithm Fastest time RAD6000 timeDifference of Gaussian and Decimate

2047 msec 20,060 msec

Prepare Next Row 1163 msec 11,397 msecInner Loop 2928 msec 28,694 msecCompute Sub-Pixel 667 msec 6536 msec

Remember: This is a really rough estimate!!

ROUGH ESTIMATE

Page 30: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 30

Section 5: Conclusions

Page 31: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 31

What We Have Shown• We have shown that the concept DR computer

presented executes a 3x3 neighborhood-type algorithm “a lot” faster than it appears that a RAD6000 executes what are probably a bunch of neighborhood algorithms.

• The reader is cautioned to not try to quantify what “a lot” means based on the data given here.

• But, it’s a good enough estimate to tell us that this is worth looking into in more detail.

ROUGH ESTIMATE

Page 32: Reconfigurable Computing: Current Status and Potential for Spacecraft Computing Systems

Barto 127-MAPLD2005Slide 32

Conclusions

• Xilinx-based DR computer shows promise for performance enhancement of a vision system

• By extension, the DR computer shows promise for the performance enhancement of other algorithms.