improve image processing using gpu computing on mali · 1 © 2014 arcsoft, inc. all rights...

28
© 2014 ArcSoft, Inc. All rights reserved. 1 Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D manager Liu Jie

Upload: trancong

Post on 28-Jul-2019

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 1

Improve Image Processing Using

GPU Computing on Mali™

By ArcSoft Senior R&D manager Liu Jie

Page 2: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 2

GPU Computing Based on Mali™

■ GPU Computing Introduction

■ Image Filter Optimization by OpenGL ES

■ JPEG Decoder Optimization by OpenCL

■ Conclusion

Page 3: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 3

Part 1. GPU Computing Introduction

Background:

• Increasing resolution of devices’ display size, images and video

contents calls for higher image processing capability. Also, CPU

is no longer capable for some complex algorithm.

• Mobile devices’ and embedded systems’ GPUs are becoming

unprecedented powerful. And GPU’s own architecture

characteristics make it especially powerful for some algorithm.

• Better load balancing across system resources and higher

energy-efficiency.

Page 4: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 4

Part 1. GPU Computing Introduction

Challenges:

• Need to Well-balance

CPU and GPU’s work

load. CPU is strong at

task parallelism and

GPU is strong at data

parallelism.

• High efficient data

pipelining between

CPU and GPU is also

one of the key points

for designing good

architecture.

Page 5: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 5

Part 1. GPU Computing Introduction

Advantages of Using Mali™:

• 128 bit vectorization, adapt to video and image processing algorithms.

• Unified memory architecture: shared memory system between in CPU and

GPU, no data copy needed.

• Support full profile OpenCL: same feature set as all OpenCL in desktops etc

are supported in Mali™, good for portability and migration of existing s/w.

• Architecture designed from the ground up for GPU Compute: nearly all

mathematical functions are accelerated in hardware.

• Proven in silicon, shipping in devices since Nexus 10 tablet in 2012, mature

and reliable architecture and drivers.

Page 6: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 6

Part 1. GPU Computing Introduction

Use Cases:

• Mobile: for digital camera side, HDR, night shot, anti-shaking,

post-processing and so on, higher performance and lower

power consumption.

• IPTV Set-top Box/Smart TV/HD Video Player: gesture

recognition based on camera, video pre/post-processing and

so on. Replacing specialized Post-processing IC with GPU

computing based software algorithm in high resolution video

playback (1080p, 4k) is cost saving and at mean time, more

swift and flexible.

Page 7: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 7

Part 2. Image Filter Optimization by GL/SL

OpenGL Introduction:

Regarding performance, power consumption and

user experience consideration, lots of image

processing effects are not suitable to be

processed using CPU. Hence using GPU to bring

out highly efficient parallel render effects

becomes important.

The Importance of

OpenGL:

Use Cases: • Using convolution to achieve sharpening effect

• Rendering YUV420sp data to achieve high

efficiency

• Image filter effects by OpenGL

Page 8: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 8

Part 2. Image Filter Optimization by GL/SL

Using

Convolution

to achieve

sharpening

effect:

Many filters used in digital image

processing are developed based on

Convolution. E.g., sharpening effect, and

its process is as follows:

1. Aligning center element of Convolution

kernel to current pixel to be processed, so

other elements in Convolution kernel are also

related to pixels respectively.

2. Multiplying each element in Convolution

kernel by the color vector of corresponding

pixel.

3. Finally, make all products to be added with

weight. So, the sharpened color vector of the

current pixel is obtained.

Page 9: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 9

Part 2. Image Filter Optimization by GL/SL

Using

Convolution

to achieve

sharpening

effect:

1. Reducing CPU and memory occupancy for loading

and binding textures.

2. By using GPU instead of CPU, we can effectively

improve rendering performance, provide better user

experience.

Advantages:

Page 10: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 10

Part 2. Image Filter Optimization by GL/SL

Image

Filter

Effects by

GL/SL:

Fully understand original CPU effect algorithm,

reconstruct and interpret into using shading

language.

1. Using vectorization, instead of calculating R, G, B

components seperately.

2. Using multiply calculation instead of division as

much as possible.

3. Reducing massive calculations inside shader, try

doing them outside for performance consideration.

4. Avoid define unused variables and const values.

5. Avoid using if/else as much as possible.

Procedure

Use case

Page 11: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 11

Part 2. Image Filter Optimization by GL/SL

Classical-Nostalgia:

Page 12: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 12

Part 2. Image Filter Optimization by GL/SL

Classical-Nostalgia:

Page 13: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 13

Part 2. Image Filter Optimization by GL/SL

Image

Filter

Effects by

GL/SL

1. After doing optimization by GL/SL, rendering time

cost is about 20ms, for 1920x1080 camera preview

image adding classical-Nostalgia effect.

2. Using pure CPU to do same effect, rendering time

cost is about 80ms.

Performance Comparison:

Page 14: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 14

Part 2. Image Filter Optimization by GL/SL

Rendering

YUV420sp

data to

achieve high

efficiency:

Receiving camera preview data and using

CPU to do face beautify and then utilizing

OpenGLES to do rendering.

1. Using Y- luminance data, and UV-

chrominance data to generate texture

respectively, and then, binding them.

2. Sample Y texture, UV texture

respectively in fragment shader.

3. Transform sampled YUV data to RGB

data by corresponding conversion

formula.

Procedure

Use case

Page 15: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 15

Part 2. Image Filter Optimization by GL/SL

Rendering

YUV420sp

data to

achieve high

efficiency:

Performance Comparison:

Using CPU to do rendering, time cost on each frame

(1920x1080): 30ms;

Using OpenGLES, time cost: 10ms;

For our demo procedure (face beautify), FPS has an uplift

from 16 to 26!

Page 16: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 16

Part 3. JPEG Decoder Optimization by OpenCL

Main Jobs: Utilizing Open CL reconstructing inverse quantization and

IDCT modules to Optimize decoding performance.

Background: Platform: Samsung Exynos 5 Dual Arndale Board, Ubuntu

12.04

GPU: ARM Mali™-T604

Software: libArcJpeg, Open CL 1.1(driver only support to

this version at that time)

Page 17: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 17

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!:

1. Well designed threads model:

• Original algorithm is to do inverse quantization and

IDCT right after each MCU being decoded.

• OpenCL based algorithm is to do Inverse quantization

and IDCT for all after all MCUs in one frame being

decoded. The optimized architecture utilize OpenCL. Let

each kernel process a single MCB in MCU to avoid data

reliance between kernels. Also, because GPU and CPU

memory space of Mali™ can do map and unmap

through OpenCL interfaces, this provide us great

convenience in using buffer.

Page 18: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 18

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!:

1. Well designed threads model(YUV422): Optimized Kernel Model:

MCU0

Y0 Y1 U0 V0

kernel0

kernel1

kernel2

kernel3

MCU0

Kernel 0 ~ 3

All MCUs

All Kernels

Page 19: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 19

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!:

2. Vectorization:

Vector8 Vector8 Vector8

A0 B0 C0

A1 B1 C1

A2 B2 C2

A3 B3 C3

A4 B4 C4

A5 B5 C5

A6 B6 C6

A7 B7 C7

A0

A1

A2

A3

A4

A5

A6

A7

B0

B1

B2

B3

B4

B5

B6

B7

C0

C1

C2

C3

C4

C5

C6

C7

Page 20: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 20

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!:

3. Optimizing Data Input: Some adjusting in Zig-Zag index mapping table in utilizing

vertorizaiton.

Entropy Decoded Data

“Zig-zag” Index Maper

Coefficient Matrix for IDCT

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63

Original Optimized

Page 21: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 21

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!: 4. Reducing Accessing Memory: Better balanced loads in mapping tasks and direct

calculation can fully utilize the potential of GPU’s

strong calculation capability.

Page 22: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 22

Part 3. JPEG Decoder Optimization by OpenCL

Before Optimization: A “Range_Limit” map was constructed, to be referenced while

data was range limited.

0 1 2 3 … 255 255 255 255 … 255 0 0 0 … … 0

0x0 0x100 0x280

Data before range limited

Data after range limited

0xffff

Page 23: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 23

Part 3. JPEG Decoder Optimization by OpenCL

After Optimization: Get rid of “Range Limit” map. Take Bit 7 to Bit 9 out of that data,

to decide its limited value.

In 8x8 block, it reduced 64 times memory read.

X X X X X X X X X X X X X X X X

Key(Bit 7 ~ Bit 9)

Bit 15 Bit 0

Key < 0x2 : Take the low 8bits value 0x2 ≤ Key < 0x5 : Set to 0xff Key ≥ 0x5 : Set to 0

Page 24: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 24

Part 3. JPEG Decoder Optimization by OpenCL

Optimization

Keys!:

5. Reducing Using Switch:

Avoiding frequently using “if” and “switch” in our kernels,

and in some simple case replacing them with build-in

functions such as “select”, which can be benefit in

optimizing performance.

Page 25: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 25

Part 3. JPEG Decoder Optimization by OpenCL

Performance

Comparison: Compared with ArcSoft Neon based Jpeg decoder,

performance in decoding 4000x3000 resolution images

uplifted as 25%.

Compared with OpenCL based open-source project Jpeg-

OpenCL, the efficiency of IDCT uplifted as 15 times.

Page 26: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 26

Part 3. JPEG Decoder Optimization by OpenCL

Thoughts or

Challenges: Calling interfaces cost time: Most interfaces related to memory such as clCreateBuffer( ),

clEnqueueMapBuffer( ) cost too much time, which greatly

counteract the benefits brought by using OpenCL.

Reasonably arranging buffer allocation: Better creating warm up kernel to allocate all necessary

buffers ahead each of them is really used.

OpenCL version: Only support to version 1.1, which brought us troubles in

both debugging and further optimizing performance.

Page 27: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 27

Part 4. Conclusion

GPU

Computing

Benefits:

• GPU Computing is a must, an unstoppable trend, and

already have brought benefits in bringing better software.

• Higher performance in imaging, computer vision,

computational photography and multimedia codecs.

ArcSoft

Efforts:

ArcSoft

Plan to do:

• H.265 decoder, Face beautifier, NightHawk (Enhancement

shot under low light condition), Gesture.

• ArcSoft has lots of CPU based algorithm and already

deliver them to our customers and made big sucuccess,

we need port more CPU algorithms to GPU.

Page 28: Improve Image Processing Using GPU Computing on Mali · 1 © 2014 ArcSoft, Inc. All rights reserved. Improve Image Processing Using GPU Computing on Mali™ By ArcSoft Senior R&D

© 2014 ArcSoft, Inc. All rights reserved. 28