1 vlsi architecture for affine motion estimation btp partners sumit johar (991232) abhishek girotra...
Post on 21-Dec-2015
214 views
TRANSCRIPT
1
VLSI Architecture for Affine Motion Estimation
BTP PARTNERS
Sumit Johar (991232)Abhishek Girotra (991201)
ADVISORS
Dr. I. ChakrabartiDr. D. Ghosh
2
OBJECTIVE
Objective – Architecture that offers high compression, high processing speed and high fidelity.
Motivation – Architecture for affine motion model, which has high computational complexity, for real time processing.
The project had three stages of operations
PHASE I PHASE II
STUDY OF BMA AND AFFINE TRANSFORMSSTUDY OF BMA AND AFFINE TRANSFORMS
ARCHITECTURE DESIGNARCHITECTURE DESIGN
TESTING AND VALIDATIONTESTING AND VALIDATION
3
MOTION COMPENSATION
4
BLOCK MATCHING ALGORITHMS
MATCHING CRITERION - MEAN OF ABSOLUTE DIFFERENCE MAD
1
0
1
01 ,,
1),(
N
i
N
jtt njymixUjyixU
NNnmMAD
MOTION VECTORS ),( vu
),(|),(, min),(
nmMADnmvuRnm
5
BMA
Mathematically motion model is given as:P(C)= P(R)+MV(C)
Where P(C)=Pixel in Current Frame Block. P(R)=Pixel in Reference Frame Block. MV(C)=Required Motion Vector.
Highly popular because of their ease of implementation.
Many proposed algorithms :One Dimensional Full Search (1DFS).
Three Step Hierarchical search (3SHS).
Two Dimensional Logarithmic Search (2DLS).One Dimensional Hierarchical Search (1DHS).
6
THREE STEP HIERARCHICAL SEARCH
7
1DHS ALGORITHM (STAGE-I)
8
1DHS ALGORITHM (STAGE-I)
9
1DHS ALGORITHM (STAGE-II)
10
WHY AFFINE ?
For BMAP(C)= P(R)+MV(C) Where P(C)=Pixel in Current Frame Block.
P(R)=Pixel in Reference Frame Block. MV(C)=Required Motion Vector. Only TRANSLATIONAL MOTION can be estimated well. Poor estimation of complex motions like rotation, scaling, skewing
For AFFINEP(C)= A.P(R)+MV(C)Where A= Deformation Matrix.
Matrix A is used to realize rotation and scaling of the macro blocks and given by:
Although there may exist an infinite ways in which an object can “scale” or “rotate”, but we choose only a finite set of B matrix, typically 8 choices
rotation by angle ɵ and scaling by factor k
11
AFFINE MOTION MODEL
CURRENT BLOCK 90 degree rotated 180 degree rotated 270 degree rotated
Horizontal Flip Vertical Flip X=Y Flip X=-Y Flip
12
AFFINE MOTION MODEL
CURRENT BLOCK Scaled Down Block by factor 2 (Sub-Sampling)
Scaled Up Block by factor 2
(Interpolated)
13
MOTION ESTIMATION BLOCKS
CURRENTBLOCK
MEMORY
SEARCH AREAMEMORY
INTER
CO
NN
EC
TIO
NETW
OR
K
PE 0
PE 1
PE 15
PARALLEL
ADDER
MINIMUMCOMPARATOR
PREVIOUSMAD
REGISTER
ADDRESS GENERATOR
MOTION VECTORREGISTER
GLOBAL TIMING ANDCONTROL SIGNALS
MAD
FRO
M E
XTER
NA
L MEM
OR
YTO
EX
TER
NA
L MEM
OR
Y
14
ARCHITECTURE FOR 3SHS (Jong et. al)
EXPLOITS THE ADVANTAGES OF INTELLIGENT DATA MANAGEMENT ANDMEMORY RECONFIGURATION.
9 PROCESSING ELEMENTS (PE).
ON-CHIP BUFFER CONFIGURATION: TO REDUCE EXTERNAL MEMORY ACCESSES, ALSO TO OVERCOME LATENCY CAUSED DUE TO THE MEMORY REFILLING FOR THE SUCCEEDING STEP.
RESIDUAL MEMORY INTERLEAVING: PROVIDES SOLUTION FOR PARALLEL DATA ACCESS.
DISTRIBUTION OF PE FUNCTIONS: WHICH EXPLOITS THE SYMMETRIC NATURE OF
3SHS PROBLEM.
15
ARCHITECTURE FOR 1DHS (Swamy et. al)
EMPLOY 16 PROCESSING ELEMENTS IN PARALLEL.
ON-CHIP BUFFER CONFIGURATION.
EXPLOITS PIPELINING FEATURE FOR MEMORY REFILLING.
HIGH SPEED CONFIGURATION
16
PROPOSED ARCHITECTURE
ARRAY OFPROCESSOR
MODULE 1
ARRAY OFPROCESSOR
MODULE 2
ARRAY OFPROCESSOR
MODULE 3
MIN
IMU
M M
AD
CO
MA
RA
TO
R
PREVIOUSMAD
REGISTER
ADDRESS GENERATOR
DEFORMATIONMATRIX AND
BACKWARD MV
GLOBAL TIMING ANDCONTROL SIGNALS
FRO
M E
XTER
NA
L MEM
OR
YTO
EX
TER
NA
L MEM
OR
Y
CB_RAM 1
SA_RAM 1
CB_RAM 2
CB_RAM 3
CB_RAM 4
SA_RAM 2
SA_RAM 3
INTER
CO
NN
EC
TIO
N N
ETW
OR
K 1
INTER
CO
NN
EC
TIO
N N
ETW
OR
K 2
INTER
CO
NN
EC
TIO
N N
ETW
OR
K 3
PARALLEL
ADDER
TYPE 1
PARALLEL
ADDER
TYPE 2
PARALLEL
ADDER
TYPE 3
MINIMUMCOMPARATOR
TRANSFORM
MOTION VECTORAFFINE MATRIX
REGISTER
17
BLOCK 0
BLOCK 15
BLOCK 0
BLOCK 15
BL
OC
K 0
BL
OC
K 0
BL
OC
K 15
BL
OC
K 15
0 degree RotationY-flip
180 degree RotationX-flip
90 degree RotationX= -Y-flip
270 degree RotationY-flip
BLOCK 0
BLOCK 1
BLOCK 31
SA_RAM 1 and SA _RAM 3
1 2 1 2 2
1 2 1
221
2
MEMORY INTERLEAVING
46 w
ords
46 words
SA_RAM2
MEMORY PARTITIONING
CB_RAM 1
CB_RAM 2
CB_RAM 3
CB_RAM 4
BLOCK DIVISION TO EXPLOIT PARALLELISM
18
PIPELINING
PIPELINING AT MEMORY STAGE
MEMORY REFILLING WITH NEW DATA REDUCES LATENCY
SEARCH REGION FOR BLOCK 1 CURRENT FRAME BLOCKS
SEARCH REGION FOR BLOCK 2
19
FEATURES
MEMORY PARTITIONING ENABLES SUPPLY OF DATA TO ALL PE’S IN 16 CLOCKS ADDRESS GENERATOR -- FSM GENERATES ROW AND COL ADDRESS PARALLEL DATA ARE FED TO PROCESSING MODULES PARTIAL MAD FOR EACH COLUMN OF CANDIDATE BLOCK IN 1 CLOCK BY ADDERS HIGH SPEED PARALLEL ADDER USED TO SUM UP PARTIAL MAD INTELLIGENT INTER NETWORK CONNECTIONS PROVIDES SOLUTION FOR 24
DEFORMATION CASES TO IMPLEMENT A NEW BLOCK MOTION ALGORITHM REQUIRES ONLY CHANGES IN
THE ADDRESS GENERATOR -- FLEXIBLE ARCHITECTURE
20
DESIGNING AND DEVELOPMENT
THE PROPOSED ARCHITECTURE WENT THROUGH FOLLOWING PHASE OF DESIGNING
PACKAGE USED: MODELSIM 5.5 VERSION, MODEL TECHNOLOGY
BEHAVIOURAL LEVEL DESCRIPTION AND DESIGN OF ARCHITECTURE VALIDATION USING VHDL LOGIC LEVEL DESIGN VALIDATION USING VHDL
RESULTS: ARCHITECTURE FUNCTIONALITY VALIDATED
21
SIMULATION RESULTS
SIMULATION OF THE BMA SCHEMES USING MATLAB FOUR MOTION SEQUENCES- FOOTBALL, MOBILE, CLAIRE, FLOWER GARDEN NUMBER OF FRAMES USED FOR SIMULATIONS=50 FIGURE OF MERIT USED- MAD, PSNR, ANSP AVERAGE PSNR OVER COMPLETE GROUP OF PICTURES WERE CALCULATED RESULTS MATCHED WITH IMPLEMENTATION RESULTS OF ARCHITECTURE
22
IMPLEMENTATION RESULTS
ALGORITHM PSNR(dB) ANSP CLK. CYCLES
1DHS 24.29 14 224
AFFINE 1DHS
24.86 336 224
3SHS 24.78 22.5 768
AFFINE 3SHS
25.53 540 360
Table 1: Flower Garden Sequence
23
ALGORITHM PSNR(dB) ANSP CLK. CYCLES
1DHS 22.88 16 256
AFFINE 1DHS
24.87 384 256
3SHS 22.89 24 768
AFFINE 3SHS
25.07 576 384
Table 2: Football Sequence
IMPLEMENTATION RESULTS
24
ALGORITHM PSNR(dB) ANSP CLK. CYCLES
1DHS 20.46 15 240
AFFINE 1DHS
20.73 360 240
3SHS 20.44 23 768
AFFINE 3SHS
20.79 552 368
Table 3: Mobile Sequence
IMPLEMENTATION RESULTS
25
ALGORITHM PSNR(dB) ANSP CLK. CYCLES
1DHS 38.66 16.5 264
AFFINE 1DHS
38.73 396 264
3SHS 38.63 25 768
AFFINE 3SHS
38.73 600 400
Table 4: Claire Sequence
IMPLEMENTATION RESULTS
Original Reconstructed (3SHS, Affine)
Original Reconstructed (3SHS, Affine)
Original Reconstructed (3SHS, Affine)
Original Reconstructed (3SHS, Affine)
Original Reconstructed (1DHS, Affine)
Original Reconstructed (1DHS, Affine)
Original Reconstructed (1DHS, Affine)
Original Reconstructed (1DHS, Affine)
28
The proposed architecture effectively uses parallelism, pipelining and data reuse to achieve a very high throughput rate .
Hence, our proposed affine motion coder, that uses the block-based affine motion estimation algorithm, is suitable for high performance real-time application purposes.
CONCLUSION
29
REFERENCES
[1] B. Furth, J. Greenburg, R. Westwater, Motion Estimation Algorithms for Video Compression, Kluwer Academic, 1997.
[2] T. Koga, K. Limuna, A. Hirano, Y. Lijima and T. Ishiguro, “Motion-compensated inter-frame coding for video confrencing,” Proc. National Telecommunication Conf., New Orleans, LA, pp G.5.3.1-G.5.3.5, 29 Nov.-3 Dec. 1981.
[3] J. R. Jain and A.K. Jain, “Displacement measurement and its application in inter-frame image coding,” IEEE Trans. On Communication vol. COM-29, no. 12, pp. 1799-1808, Dec. 1981.
[4] M.L Po and W.C. Ma, “A Novel four-step search algorithm for fast block motion estimation,” IEEE Trans. Circuits and Systems On Video technology, vol. 6, no. 3, pp. 313-317, Jun. 1996
[5] D. Ghosh and A.P. Shivaprasad, “One dimensional hierarchical search for block matching algorithm in motion estimation”, Third International Electronic Engineering Congress (INTERCON’96), Trujillo, Peru, 11-17, Aug. 1996.
[6] H.M. Jong, L.G. Chen, T.D. Chiueh, “Parallel architecture for 3 SHS block matching algorithm,” IEEE Trans. Circuits Syst. Video Technol., vol 4, pp. 407-415, Aug 1994.
[7] P.N. Swamy, I. Chakrabarti and D. Ghosh, “An Architecture for Motion Estimation using 1DHS block matching algorithm”, IEE Electronic Letter, vol. 149, no. 5, pp. 229-239, Sep. 2002.
[8] H. Brucewitz, “Motion compensation with triangles”, Third International Workshop on 64kbits/s Coding of Moving Video, Sep. 1990.
[9] H. Li and R. Forchheimer, “A new motion compensated technique for video compression”, Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP’93), vol. 5, pp. 441-446, Apr. 1993.
30
REFERENCES [10] X. Marrichal and B. Macq, “Active mesh reconstruction of block based motion information”,
Proc. IEEE International Conference on Acoustics, Speech, Signal Processing (ICASSP’99), Seattle, USA, pp. 2605-2208, May. 12-15, 1999.
[11] Y. Wang and O. Lee, “Active mesh- A feature seeking and tracking image sequence representation scheme”, Proc. IEEE Trans. Image Processing, vol. 3, pp. 610-624, Sep. 1994.
[12] C. L. Huang and C.Y. Hsu, “A new motion compensation method for image sequence coding using hierarchical grid interpolation”, IEEE Trans. Circuits and Systems on Video Tech., vol. 4, pp. 42-52, Feb. 1994.
[13] U. V. Raghavendranadh Reddy, I. Chakrabarti and D. Ghosh, “Block based affine motion estimation”, Proc. International Conference on Imaging Science, Systems and Technology (CISST’02), Las Vegas, Nevada, USA, vol. 2, pp. 819-824, June 24-27, 2002
[14] H. M. Jong, L. G. Chen and T.D. Chieuh, “Parallel architecture for 3SHS block matching algorithm”, IEEE Trans. Circuits and Systems on Video Tech., vol. 4, no. 4, pp. 407-415, Aug. 1994.
[15] W. Badawy and M. Bayoumi, “A Multiplication-Free Parallel Architecture for Affine Transformation,” IEEE International Conference on Application-specific Systems, Architecture and Processors, Boston, MA, July 10-12, 2000 pp.25-34.
[16] ModelSim Reference Manual Version 5.5, Model Technology Inc., 1997. [17] Matlab Version 6.0.0.88, MathWorks Inc., 2000
31
THANK YOU