highly parallel framework for hevc motion estimation on many-core platform
DESCRIPTION
Data Compression Conference 2013. Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform. Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li. Outline. Introduction Related Work Proposed Method Experimental Results Conclusion. Introduction (1/2). HEVC - PowerPoint PPT PresentationTRANSCRIPT
1
HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-
CORE PLATFORM
Data Compression Conference 2013
Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li
2
Outline Introduction Related Work Proposed Method Experimental Results Conclusion
3
Introduction(1/2)
HEVC coding tree unit (CTU)
4
Introduction(2/2)
Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs)
Directed acyclic graph (DAG)
5
Related Work(1/2)
Local parallel method (LPM) [16] Motion estimate region (MER)
[16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012
6
Related Work(2/2)
Local parallel method (LPM)
123
M = 16 or 8
8
7
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
8
Proposed Method.A(1/3)
Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not
overlap. The IPU’s upper boundary and MER’s upper boundary do not
overlap.
123
9
Proposed Method.A(2/3)
10
Proposed Method.A(3/3)
Neighboring CTUs left upper upper-left upper-right
11
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
12
Proposed Method.B(1/4)
Generate a DAG to capture the dependency relationships of CTUs.
13
Proposed Method.B(2/4)
DAG consists of a set of vertices V and edges E. data dependency <=> an edge. Processed <=> remove
123
14
Proposed Method.B(3/4)
Condition matrix (CM)
15
Proposed Method.B(4/4)
16
Proposed Method A. Data Dependency Analysis
B. DAG for CTUs
C. Highly Parallel Framework
17
Proposed Method.C(1/5)
18
Proposed Method.C(2/5)
Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is
designed to record the number of related CTUs for each CTU. Step2 :
When some values in the CM become zero, get the corresponding coordinates and push them into DQ.
19
Proposed Method.C(3/5)
Step3 :Get coordinates from DQ and process corresponding
CTUs in parallel on many-core platform. Step4 :
Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.
Step5 :Repeat above steps 2~4 until each frame is over.
20
Proposed Method.C(4/5)
Maximum parallelism of CTU
123
Maximum parallelism of highly parallel framework
123
Average parallelism of highly parallel framework
123
21
Proposed Method.C(5/5)
22
Experimental Results(1/5)
23
Experimental Results(2/5)
24
Experimental Results(3/5)
25
Experimental Results(4/5)
26
Experimental Results(5/5)
27
Conclusion(1/1)
Highly parallel framework provide sufficient parallelism for many-core platforms.
Use the DAG-based order to parallelize CTUs.