neta peled & hillel mendelson supervisor: mike sumszyk final presentation of part b annual...

40
Neta Peled & Hillel Mendelson Supervisor : Mike Sumszyk Real Time Video Filtering Final Presentation of part B Annual project

Upload: arlene-welch

Post on 02-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Neta Peled & Hillel MendelsonSupervisor: Mike Sumszyk

Real Time Video FilteringFinal Presentation of part B

Annual project

Page 2: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

The algorithm

Part A overview

Part B challenges

Blocks implementation

Conclusions

Real Time Video Filtering

Page 3: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

The algorithm: Nonlinear Diffusion use numeric solution with iterations to solve

the diffusion equation

Why use it for image processing? Image noise is smoothed Edges remain sharp

Project Recap

Page 4: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Original image

Page 5: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

dt = 30 !!! one iteration

Look at the edges(sharp!)

Look at the hat(smoothed)

Page 6: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Part A overview

Difficulties with the algorithm: Very complex design, makes real time

almost impossible Transpose entire image Reverse order loop huge memory bandwidth required

So why use this model ? Good results even after a single iteration

(Yoni & Zion needed at least 20 iterations => need for multiple FPGAs)

Page 7: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Part A overview

Exploring different architecture solutions in Matlab Comparing “sub-frames” processing vs. entire frame

processing Fixed-point analysis of the algorithm in Matlab Learning about memory resources:

Internal memory: MRAM, M4K, M512 External memory: DDR

Analyzing the memory bandwidth requirements of the algorithm

DVI signal generators Implementation of a real-time streaming of pixels

through DDR double buffering: • DVI in=>DDR write=>DDR read =>DVI out

Page 8: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Part B Transpose image implementation

• First transpose (800x525 => 525x800)• Second transpose (525x800 => 800x525)• Each transpose implies synchronization between internal memories and

external memories using dedicated controllers and FIFOs

Detection of frame first pixel• Needed because each transpose block should start operating only at the

first pixel of a frame• Also needed because the pipeline of Sergey & Roman need to get a starting

signal, when the first pixel of a frame enter the pipeline.

Implementation of frame rate convertors• Down rate convertor at the input (60 fps => 15 fps)• Up rate convertor at the output (15 fps => 60 fps)

CORRECT DVI Synchronization!• PLL fixed location at input and output pins. • Registered Input/output pins.

Fixed-point analysis of the algorithm in Quartus

Page 9: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

DVIIN

DVIOUT

Part A Implementation

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

DDR 2 banks

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Internal memories

Internal memories

Page 10: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

The Final architecture (PART B)

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 2 banks

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 11: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

The Final architecture (PART B)

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 12: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

Fundamental DDR controller

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 13: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Fundamental DDR controller

There are 4 bidirectional communication channels to/from DDR

Each channel requires another controller which is a variation of a fundamental controller

Up rate Down rate First tranpose (800x525 => 525x800) Second Transpose (525x800 => 800x525)

Each one has asymmetric behavior for read and write

Page 14: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

WRITEcontroller

READcontroller

Fundamental DDR controller

Synchronization states

Page 15: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Dual ClockFIFO

DDR WR controller

DDR RD controller

wr fin

continue

continue

rd fin

DDR double buffer

When finishing a frame:Each controller calculates its new address and waits for the other controller to finish.While waiting, the controller keeps sending “continue” signal to the other controller.

Dual ClockFIFOPipe Pipe

Page 16: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Bloody signals Flush -According to Gidel’s manual: flush signal is used to force writing the data to the

memory when the last word is incomplete.BUT, even when using a port size equal to the memory width, one must use the ‘flush’ signal.

Write empty: When performing write bursts from different

addresses, one must wait for signal write_empty before starting a new burst. Without waiting - the data is lost.

NOT in Gidel’s manual!

Page 17: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

Down rate DDR controllers

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 18: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Down rate controllers Write controller:

Writes to DDR only one frame out of every 4 frames.

Frame rate: 15 frames/sec, pixel rate: 6.2MHz• Data loss is almost unnoticeable• Algorithm performance is not affected!

Actual bandwidth: 25 MHz (DVI clock)

Read controller: Same as the fundamental DDR controller (burst of

entire frame) Actual bandwidth: 6.2 MHz

Page 19: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Down rate controllers

“normal”READ

controller

WRITEcontroller

Write 1 frame to DDR

Counts 3 more frames, cleans the pipe

Page 20: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

Up rate DDR controllers

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 21: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

UP rate controllers Write controller:

Same as the fundamental DDR controller (burst of entire frame)

Actual bandwidth: 6.2 MHz

Read controller: Reads the same frame from the DDR 4 times

• To meet DVI data rate requirements Actual bandwidth : 25MHz

Page 22: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Up rate controllers

READcontroller

WRITEcontroller

Main “loop”- reads 4 times the same frame

Sync with WR, swap addresses

Page 23: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

Transpose DDR controller

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 24: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

stratixII

A reminder of how it works:

M-RAMWRITE

M-RAMREAD

DDRIIT’

WRITE

DDRIIT’

READ

Penalty every row skip

Sequential read from DDR

Penalty all the time !

Transpose

Page 25: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose challenges Two different transposes:

The first transpose - 800x525 Transpose back - 525x800 Debugging difficulty…

Synchronization to the beginning of the frame is required

Transpose counters: “heavy” sequential Combinational logic causes

Timing problems

Transpose on read or on write?

Page 26: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose - memory configuration settings

Mram Max number of rows (minimum penalty) Number must divide 800 or 525 (no reminder) Number must agree with Gidel controller We chose 50 and 35 lines respectively

DDR Load balancing Gidel requirements

Page 27: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose’s synchronization blocksMram

Write and read Address counters

Beginning of frame detection unit

delaying the data

3 Mrams for RGB

Page 28: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose’s synchronization blocksDDR

Synchronization on the WR controller:New “Data in” portdesignated states to

deal with the first pixel of the frame after reset.

“cleans” the DCFIFO until detecting the first pixel of a new frame.

The WR controller sends reset signal to the RD controller.

Page 29: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose counters DDR and Mram counters:

The “heaviest” combinational logic of the entire design

If (a) and (not b) and (not c) thenIf (a) and (b) and (not c) thenIf (a) and (b) and (c) then

Long CL paths results in timing problems!

No code reuse and more HW (but we have enough!)

guarantees shorter, parallel CL

If (a) then If (b) then

If (c) then

Page 30: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Can’t easily “divide and conquer”- Result is available only after 2 transposes:

We used SignalTap and built verification units

Debugging difficulties

Mram DDR

Addresses counters

Addresses counters

First T’

sync sync

Dual clk

FIFO

Mram DDR

Addresses counters

Addresses counters

Second T’

sync sync

Dual clk

FIFO

Page 31: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Debugging difficulties

Can’t simulate DDR’s behavior in MODELSIM We don’t have a reliable model of the external

memory’s behavior Gidel’s controller is NOT “transparent” to the

users - We know nothing about:• Gidel’s Internal implementation• Gidel’s handling requests policy of the DDR

We can read from the DDR through PCI but – it changes the data path…

Page 32: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Transpose on read Read and Write protocols are different

WRITE:• Wait 16clks after start• Wait ~100 clks after flush• Wait for signal write_empty

READ:• Wait for signal almost_empty_RD

Looks like READ loop is shorter! We successfully implemented transpose on read. However, the improvement is not good enough to

avoid using down/up rate controllers. The combined up rate and transpose: read loop is

more “busy”, better perform T’ on write!

Page 33: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Can we avoid the loss of data? 2 iterations:

Only 2 transposes are needed! 2 FPGAs DDR configuration (for each FPGA):

• 1 transpose on bank A (19 MHz)• 1 transpose on bank B (19 MHz)

For each bank: 180x0.75/3=45 >25.2 !!!

Add more memory:• 1 T’ on bank A, 1 on bank B, 1 on additional memory:

For each bank: 180x0.75/3=45 >25.2 !!!

Page 34: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

T’

DVIIN

PIPE

DVIOUT

Timing Problems

columns

lines

Freq controller:

4F to F

T’ PIPE

Freq Controller+T’

4F to F

data

24bit(RGB)

3bit

DVI sync

PLL

Reset detector

DVI Ctrl signals

generator

DVI sync

3bit

25.2MHz

DVI clk

DVI clk

¼ DVI clk

¼ DVI clk

¼ DVI clk

DDR 8 Double Buffers

Gidel’s memory controller

180MHz 180MHz

StratixII

data

24bit

Page 35: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Timing Problems

Problems Inconsistent compilation results Jittery image Lost data Timing problems

Solutions Registered I/Os PLL Fixed placing

Page 36: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Additional Issues Multiport

• Data loss at end of burst• Long penalties• I/O strength• ProcII vs. ProcIII (no DVI)

Sync• Waiting for signal from second group

1 2 3 4 5 2 7 12 17

6 7 8 9 10 3 8 13 18

11 12 13 14 15 4 9 14 19

16 17 18 19 20 5 10 15 20

6 11 16 1

Page 37: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Additional Issues SignalTap

Page 38: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Summery

Internal memory blocks:Addressing controllerTransposeLine reverse

External memory:Double buffer on DDRUp/down rate controller

DVI synchronization

Page 39: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

Questions?

Page 40: Neta Peled & Hillel Mendelson Supervisor: Mike Sumszyk Final Presentation of part B Annual project

We invite you to join us in the lab for a short

demonstration