distortion correction ece 6276 project review

21
Distortion Correction ECE 6276 Project Review Team 5: Basit Memon Foti Kacani Jason Haedt Jin Joo Lee Peter Karasev

Upload: unity-bonner

Post on 04-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Distortion Correction ECE 6276 Project Review. Team 5: Basit Memon Foti Kacani Jason Haedt Jin Joo Lee Peter Karasev. Initial Results. Problems. Old code was very slow Matlab was ported line-by-line Redundant computations Loops not nested correctly - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distortion Correction ECE 6276 Project Review

Distortion CorrectionECE 6276 Project Review

Team 5:Basit MemonFoti KacaniJason HaedtJin Joo LeePeter Karasev

Page 2: Distortion Correction ECE 6276 Project Review

Initial Results

Image Size(Pixels)

Optimization Parameters

Optimization (Design Goal)

Area Score Latency Cycles /Throughput

Cycles

Maximum Delay (ns)

Slack (ns)

8x8 No Optimization Area 14713 911/1250 11.41 -1.41

8x8 No sqrt Area 3523 1625/2154 9.27 0.73

8x8 No sqrt Latency 16125 591/5910 11.41 -1.41

32 x 32 No sqrt Area 4002.33 25697/31906 12.18 -2.18

64 x 64 No sqrt Area 4077.14 102593/127298 12.18 -2.18

256 x 256 No sqrt Area 3906.16 1639169/2098434 8.94 1.06

640 x 480 No sqrt Area 4392.45 7681921/9526242 9.49 0.51

Page 3: Distortion Correction ECE 6276 Project Review

Problems

• Old code was very slow

• Matlab was ported line-by-line– Redundant computations– Loops not nested correctly– Not able to exploit Catapult C

features fully

Page 4: Distortion Correction ECE 6276 Project Review

Target & Test Vectors for Catapult

• Catapult C was targeted for the Stratix III FPGA with a clock frequency of 100 MHz

• For the following Catapult results used a 320x240 image like shown below:

Page 5: Distortion Correction ECE 6276 Project Review

Test Vectors• Images distorted in matlab so that ground truth exists

• Flattened into binary streams

• Identical format for matlab, plain C, AC Datatypes results

Page 6: Distortion Correction ECE 6276 Project Review

Optimizations after CDR

• Look Up Tables

• Optimal fixed point bit sizes

• Algorithmic changes– Streamlined loops (allows for optimal

pipelining/unrolling)– Math optimizations

Page 7: Distortion Correction ECE 6276 Project Review

1. Original Power Series with AC types div()

• Area: 11734

• Throughput Cycles: 5,145,841 (67 per pixel)

• AC Datatypes div() function uses only bit operations and additions

Page 8: Distortion Correction ECE 6276 Project Review

2. Use of Fast division (iterative Newton’s method)

• Area: 12851.12

• Throughput Cycles: 3,763,441 (49 per pixel)– Initial was 5,145,841

• Requires mult elements

Page 9: Distortion Correction ECE 6276 Project Review

3. Combined Power Series and Division

• Area: 17705

• Throughput Cycles: 2,765,041 (36 per pixel)– Initial was 5,145,841

• Appears to be an example of loop shrinking using properties of add and multiply

• Found by writing out the sums and substituting the power series result as a sum into the div() iterative loop.

Page 10: Distortion Correction ECE 6276 Project Review

4. Add approximate square root (Taylor Series sum)

• Area:• Throughput Cylces: 1,843,441 (24 per pixel)

– Initial was 5,145,841

• 279% total improvement in throughput• Impractical total increase in area for this solution- the ROM is

huge• Not able to meet timing with fast square root

Page 11: Distortion Correction ECE 6276 Project Review

Why the approximate sqrt ROM is difficult• If equal step size in variable used, 256 size

ROM works everywhere except near center• Getting enough precision with equal step

size requires too many entries (8192)

Smaller ROM fails- circle artifact in the middle

Conclusion: the AC Datatypes sqrt() is quite good, it solves bit-at-a-time in the output. Only shifts and bit operations are needed. It takes a number of iterations but if the pixels are pipelined as a large block it doesn’t matter much.

Page 12: Distortion Correction ECE 6276 Project Review

Memory Size and Storage Optimization• Change LUT to 256x4 (right side is power of 2 as well), tolerate

slightly more error in approximation of inverse distortion function

• Use 2D arrays, get rid of indexing add and multiply

• See line-to-line comparison below; huge area savings!

Before

After

Page 13: Distortion Correction ECE 6276 Project Review

Catapult C Results Summary

Page 14: Distortion Correction ECE 6276 Project Review

Catapult C Results Summary (cont…)

Page 15: Distortion Correction ECE 6276 Project Review

Catapult C Results Summary (cont…)

• Can meet up to 150MHz

• Optimized for 1 clock cycle per pixel

Page 16: Distortion Correction ECE 6276 Project Review

Catapult C Results Summary (cont…)

• Not optimal (@168MHz)

• Notice negative slack

Page 17: Distortion Correction ECE 6276 Project Review

Catapult C Results Summary (cont…)

• Optimal results for various images• Meet 1280x960 @150MHz with minimal area

overhead (according to Catapult)

Page 18: Distortion Correction ECE 6276 Project Review

Verification

Page 19: Distortion Correction ECE 6276 Project Review

Conclusions

Page 20: Distortion Correction ECE 6276 Project Review

Future Work

• Parallelize algorithm– Work in blocks of pixels

• Optimize buffer/memory usages– Use streaming buffers

• Streamline algorithm• Allow variable decimation/interpolation

to make smoother undistortions

Page 21: Distortion Correction ECE 6276 Project Review

21 ECE 6276 Final Project Team 5

7/14/2009

Questions?

?