ultra hd video scaling: low-power hw ff vs. cnn-based super-resolution

63
VICTOR H. S. HA, PH.D. VPG MEDIA AND DISPLAY IP, INTEL CORP.

Upload: intel-software

Post on 23-Jan-2017

405 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

VICTOR H. S. HA, PH.D.

VPG MEDIA AND DISPLAY IP, INTEL CORP.

Page 2: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

2

1

2

3

Title: Ultra High Definition (UHD) Video Scaling:Low-Power (LP) Hardware (HW) Fixed-Function (FF) vs.Convolutional Neural Network (CNN)-based Super-Resolution (SR)

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 3: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Gen9 Intel® processor graphics

3

Page 4: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

4

Table of Content

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 5: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

5

Page 6: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

UHD End-to-End Support in Gen9 Intel® Processor Graphics

UHD Decode, Encode, Display

UHD Content

UHD Display

UHD Capture

UHD Video Scaling Support• Upscale from HD to UHD• Downscale from UHD to HD

Display Port* (DP), Embedded DisplayPort* (eDP), Miracast* and other names and brands may be claimed as the property of others

* GPU Accelerated; Media Codec support may not be available on all operating systems and applications.

Page 7: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

7

Why UHD Scaling is Different?

SD to HD Scaling

• Pixel Resolution from 720x480 to 1920x1080

• Aspect Ratio from 4:3 to 16:9

• SD Video in Low Quality, often requiring, De-interlace, De-noise, De-blocking, Sharpening, etc.

FHD to 4K UHD Scaling

• Pixel Resolution from 1920x1080 to 3840x2160

• Aspect Ratio stays at 16:9

• FHD Video already in High-Quality with Crisp Details

Page 8: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

8

Why UHD Scaling is Different?

SD to HD Scaling

• Pixel Resolution from 720x480 to 1920x1080

• Aspect Ratio from 4:3 to 16:9

• SD Video in Low Quality, often requiring, De-interlace, De-noise, De-blocking, Sharpening, etc.

• 345,600 pixels to 2,073,600 pixels

FHD to 4K UHD Scaling

• Pixel Resolution from 1920x1080 to 3840x2160

• Aspect Ratio stays at 16:9

• FHD Video already in High-Quality with Crisp Details

• 2,073,600 pixels to 8,294,400 pixels

Page 9: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

9

Un

slic

eG

eo

me

try

Subslice

Slice Common

FF Media in Unslice

• 6th Generation Intel Core Processor Graphics on 14nm Process

• Support of Latest APIso DirectX* 12/11.3o OpenCL 2.0o OpenGL* 4.4

• Scalable uArch Partitioning similar to 5th Generation Intel® Core™ Architecture o Unslice, Slice, Subslice, etc.

• Improved Design for Better Energy Efficiency

• Flexible and Finer-grain Power Management

* Other names and brands may be claimed as the property of others

Page 10: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

10

Multi-Format Codec (MFX)

• HEVC Decode

• HEVC Encode

• HEVC 10bit Decode (GPU Accelerated)

• JPEG / MJPEG Decode

• JPEG / MJPEG Encode

• MPEG2 Decode and Encode

• AVC Decode and Encode

• VP8 Decode and Encode

FF Media in UnsliceU

nsl

ice

Ge

om

etr

y

Subslice

Slice Common

Page 11: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

11

Video Quality Engine (VQE)

• Video Processing and Enhancement

• 16bit per channel processing pipe

• RAW image processing pipe

• De-noise

• De-interlace

• Contrast/Saturation Enhancement

• Skin-tone Detection and Enhancement

• Color Space Conversion (BT2020)

• Color Correction

FF Media in UnsliceU

nsl

ice

Ge

om

etr

y

Subslice

Slice Common

Page 12: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

12

Scaler and Format Conversion (SFC)

• Dedicated Media FF HW

• Advanced Video Scaler (AVS)

• Sharpness Enhancement

• Color Space Conversion

• Chroma Sampling

• Rotation and other Format Conversions

Media Sampler

• Video Motion Estimation (VME)

• Advanced Video Scaler (AVS)

• Sharpness Enhancement

FF Media in UnsliceU

nsl

ice

Ge

om

etr

y

Subslice

Slice Common

Page 13: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

SFC (Scaler and Format Converter)

Low-Power UHD Video Playback

• New SFC HW pipe is added to deliver Ultra Low Power media playback experience

• SFC is connected inline (without memory read/write) to MFX (video decode) and VQE (video processing)

Page 14: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

14

Video Decode Scaling Display (or Encode)

MFXVideo Decode

Media Sampler AVS

VQEVideo Enhancement

MFXVideo Decode

SFC AVSVD-SFC (Video Decode SFC)

VQEVideo Enhancement

MFXVideo Encode

MFXVideo Encode

SFC AVS Example #1

GEN8 without SFC

GEN9 with SFC

memoryread/write

memoryread/write

memoryread/write

Page 15: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

15

SFC AVS Example #2

Video Quality Enhancement Scaling Display (or Encode)

MFXVideo Decode

VQEVideo Enhancement

Media Sampler AVS

MFXVideo Decode

VQEVideo Enhancement

SFC AVSVE-SFC (Video Enhance SFC)

MFXVideo Encode

MFXVideo Encode

GEN8 without SFC

GEN9 with SFC

memoryread/write

memoryread/write

memoryread/write

Page 16: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

SFC (Scaler and Format Converter)

Low-Power UHD Video Playback

• New SFC HW pipe is added to deliver Ultra Low Power media playback experience

• SFC is connected inline (without memory read/write) to MFX (video decode) and VQE (video processing)

SFC pipeline delivers many benefits:

• Inline Connection: Reduced bandwidth and power consumption

• SFC handles scaling, detail enhancement, color space conversion, and other format conversion on the fly

• 12bit Data Path ready for Ultra-HD (UHD), High Dynamic Range (HDR), Wide Color Gamut (WCG)

• Free up EU resources (slice/subslice) from media use cases and power-gated when not used

• SFC can process UHD Video (3840x2160 @ 60fps) operating at power-efficient low-frequency mode

Page 17: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

17

AVS (Advanced Video Scaler) in SFC

AVS is a Low-Power Fixed-Function Hardware in SFC• Real-time video scaling in a 12bits per channel data path• Consists of a pair of spatial filters, Sharp Filter and Smooth Filter

Adaptive Mode• The results of the two filters are alpha-blended to generate the output pixel value

• The alpha blending factor, , is computed for each pixel from neighboring pixels

Sharp Filter

Smooth Filter

Blending Factor Computation +

InputPixel

OutputPixel

Blending Factor

Page 18: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

18

AVS Smooth Filter

Reference Ground Truth (1440x960) Smooth Filter (720x480 to 1440x960)

** Blurrier than Reference Ground Truth **

Page 19: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

19

AVS Sharp Filter

Reference Ground Truth (1440x960) Sharp Filter (720x480 to 1440x960)

** Similar to Reference Ground Truth **

Page 20: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

20

AVS Sharper Filter

Reference Ground Truth (1440x960) Sharper Filter (720x480 to 1440x960)

** Sharper than Reference Ground Truth **

visual artifact

Page 21: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

21

Sharp vs. Smooth Filter

Smooth Filter Sharper Filter

** Ringing Artifacts **

Page 22: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

22

Adaptive Mode in AVS

Sharp Filter• Sharp and Crisp Output on Natural Scenes

• Ringing on Computer Graphics

Smooth Filter• Blurrier Output on Natural Scenes• Ringing-free Output on Computer Graphics

Adaptive Mode• Best of Both Filters possible based on Per-Pixel Adjustment

• Sharp Output on Natural Scenes

• Ringing-free Output on Computer Graphics

Page 23: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

23

Sharp vs. Smooth Filter

Smooth Filter Sharper Filter

** Ringing Artifacts **

Page 24: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

24

Adaptive Mode 1

Adaptive Mode On Sharper Filter

** Ringing Artifacts **** Sharper than Smooth Filter without Ringing **

Page 25: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

25

Adaptive Mode 2

Adaptive Mode On Smooth Filter

** Sharper than Smooth Filter **

Page 26: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

26

Adaptive Mode in AVS

Sharp Filter• Sharp and Crisp Output on Natural Scenes

• Ringing on Computer Graphics

Smooth Filter• Blurrier Output on Natural Scenes• Ringing-free Output on Computer Graphics

Adaptive Mode• Best of Both Filters possible based on Per-Pixel Adjustment

• Sharp Output on Natural Scenes

• Ringing-free Output on Computer Graphics

Page 27: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Media Scaler Interface

Interface Video Scaler

Intel® Media Server Studio SDKhttps://software.intel.com/en-us/media-sdk

• Microsoft Windows* DXVA SFC AVS (default)

• LibVA (Android/Linux) SFC AVS (default)

macOS* SFC and AVS

27

• Application SW specifies input/output formats, then

o conf.vpp.In.Width, Height, CropX, CropY, CropW, CropHo conf.vpp.Out.Wdith, Height, CropX, CropY, CropW, CropH

• MSDK configures the video processing pipeline accordingly

* Other names and brands may be claimed as the property of others

Page 28: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

neuron to convolutional neural networks for Super-resolution scaling

28

Page 29: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

29

Table of Content

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 30: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

30

From Neuron to CNN

Neuron CNN

Scaling Super ResolutionSparse Coding

Super Resolution

CNN-based SRSparseCoding

Sparse CodingDeep Network

Page 31: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

neuron to convolutional neural networks

31

Page 32: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

32

Neuron

A neuron

• Is a nerve cell in brains, spinal cords, etc.

• Processes and transmits data through electrical/chemical signals

• Can give rise to multiple dendrites, but not more than one axon

• Signals travel from the axon of one neuron to a dendrite

of another (with many exceptions to these rules) via a synapse

• Connects to each other to form neural networks

• A human brain contains about 100 billion neurons

• Each has 5K~100K synaptic connections to other neurons

input signal input signal

dendrites

axon

output signal

axon terminals

nucleus

cell body

Page 33: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

33

Artificial Neuron

• A Neuron has a single Axon and multiple Dendrites

o Dendrites receive incoming electrical signals

o Electrical signal is sent out from an Axon to Dendrites

and 𝑜𝑢𝑡 = 01

𝑖𝑓 𝑓 < 0𝑖𝑓 𝑓 ≥ 0

𝑓 = 𝑏 +

𝑖=0

𝑛

𝑤𝑖𝑥𝑖

S

x0

xn

b

fout

w0

wn

x1 w1...

.

.

.

input signal input signal

dendrites

axon

output signal

axon terminals

nucleus

cell body

Page 34: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

34

Artificial Neuron – what does it do?

x0 x1 x0 AND x1 x0 NAND x1

0 0 0 1

0 1 0 1

1 0 0 1

1 1 1 0

x0 x1 f out

0 0 3 1

0 1 1 1

1 0 1 1

1 1 -1 0

S

x0

x1b

fout

w0

w1

NAND gate is universal for computation - any logic can be built up out of NAND gates

An artificial neuron (perceptron with 2 input) can implement a NAND gate:• input = (x0, x1)

• weights = (w0, w1) = (-2, -2)

• bias b = 3

• out = 0 if f < 0

1 if f ≥ 0

NAND Gate

Artificial Neuron

and 𝑜𝑢𝑡 = 01

𝑖𝑓 𝑓 < 0𝑖𝑓 𝑓 ≥ 0

𝑓 = 𝑏 +

𝑖=0

𝑛

𝑤𝑖𝑥𝑖

Page 35: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

S

x0

x1b

fout

w0

w1

S

x0

x1b

fout

w0

w1

S

x0

x1b

fout1

w0

w1

S

x0

x1b

fout2

w0

w1

S

x0

x1b

fout0

w0

w1

in0

in1

Layer 1 Layer 2

35

Neural Network

Connect multiple artificial neurons• Simple compute devices become interconnected• Connections between neurons determine the function of the overall network• Massively parallel structure allows fast results with slow neurons• Multi-layer networks are more powerful

Page 36: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

36

Convolutional Neural Networks (CNN)

What is it?• Multiple layers of artificial neural networks

• Some layers performing Convolution Operations that extract features (e.g., edges) from input images

• 2D Convolution Operation is

Usages:• Image Classification

• Object Detection

• Face Recognition

• Denoise

• Deblurring

• Super-Resolution Scaling

𝑓(𝑥, 𝑦) =

𝑖=−∞

𝑗=−∞

𝑤 𝑖, 𝑗 𝑥(𝑥 − 𝑖, 𝑦 − 𝑗)

Page 37: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

37

Convolution using a Neuron• Each neuron processes a small part (receptive field) of input image

using shared weights in convolutional layers

What’s it good for? Why use it?• Instead of designing and optimizing each convolution kernel manually,

train the network to solve difficult problems simply by feeding input and output pairs (i.e., feature extraction process is learned by the network)

x0 x1

x3 x4

x2

x5

x6 x7 x8

w0 w1

w3 w4

w2

w5

w6 w7 w8

x1

x4

x2

x5

x7 x8

w0 w1

w3 w4

w2

w5

w6 w7 w8

x1

x4

x2

x5

x7 x8

w0 w1

w3 w4

w2

w5

w6 w7 w8

x0 x1

x3 x4

x2

x5

x6 x7 x8

x0 x1

x3 x4

x2

x5

x6 x7 x8

x1

x4

x2

x5

x7 x8

x0 x1

x3 x4

x2

x5

x6 x7 x8

Convolution Kernel Convolution Kernel Convolution Kernel

Image Patch Image Patch Image Patch

Input Image Input Image Input Image

𝑓 = 𝑏 +

𝑖=0

𝑛

𝑤𝑖𝑥𝑖

S

x0

xn

b

fout

w0

wn

x1 w1...

.

.

.

𝑓(𝑥, 𝑦) =

𝑖=−∞

𝑗=−∞

𝑤 𝑖, 𝑗 𝑥(𝑥 − 𝑖, 𝑦 − 𝑗)

Page 38: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

CNN-based Super-Resolution

38

Page 39: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

39

Super-Resolution

Super-Resolution

• The term has been used by many to mean many different things over the years

• We will define what we mean by it in this talk, and then move on

Super-Resolution as Upscaling

• Input = Low-resolution Image (e.g., 1920x1080 RGB picture)

• Output = High-resolution Image (e.g., 3840x2160 RGB picture)

• Super-Resolution Requirements:

o Use a single input image to generate a single output image, i.e., Single-frame (Spatial) SR

o Output image quality is better than traditional scalers based on interpolation (bilinear, bicubic, etc.)

o No visual artifacts are introduced by SR upscaling

Page 40: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Publications on CNN-based SR

40

SCN from University of Illinois – Urbana Champaign1. Image Super-Resolution via Sparse Representation, Huang et al., TIP 20102. Coupled Dictionary Training for Image Super-Resolution, Huang et al., TIP 20123. Deep Networks for Image Super-Resolution with Sparse Prior, Huang et al., ICCV 20154. Self-Tuned Deep Super Resolution, Huang et al., CVPR 20155. Robust Single Image Super-Resolution via Deep Networks with Sparse Prior, Huang et al., TIP 2016

SRCNN from The Chinese University of Hong Kong1. Learning a deep convolutional network for image super-resolution, Tang et al., ECCV 2014

2. Image Super-Resolution using Deep Convolutional Networks, Tang et al., TPAMI 2016

DRCN from Seoul National University1. Deeply-Recursive Convolutional Network for Image Super-Resolution, Kim et al., CVPR 2016

2. Accurate Image Super-Resolution using Very Deep Convolutional Networks, Kim et al., CVPR 2016

Technische Universität Mϋnchen, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, Smagt et al., ICONIP 2014

Huaqiao University, Deep Network Cascade for Image Super-Resolution, Chen et al., ECCV 2014

Page 41: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Publications on CNN-based SR

41

SCN from University of Illinois – Urbana Champaign1. Image Super-Resolution via Sparse Representation, Huang et al., TIP 20102. Coupled Dictionary Training for Image Super-Resolution, Huang et al., TIP 20123. Deep Networks for Image Super-Resolution with Sparse Prior, Huang et al., ICCV 20154. Self-Tuned Deep Super Resolution, Huang et al., CVPR 20155. Robust Single Image Super-Resolution via Deep Networks with Sparse Prior, Huang et al., TIP 2016

SRCNN from The Chinese University of Hong Kong1. Learning a deep convolutional network for image super-resolution, Tang et al., ECCV 2014

2. Image Super-Resolution using Deep Convolutional Networks, Tang et al., TPAMI 2016

DRCN from Seoul National University1. Deeply-Recursive Convolutional Network for Image Super-Resolution, Kim et al., CVPR 2016

2. Accurate Image Super-Resolution using Very Deep Convolutional Networks, Kim et al., CVPR 2016

Technische Universität Mϋnchen, Image Super-Resolution with Fast Approximate Convolutional Sparse Coding, Smagt et al., ICONIP 2014

Huaqiao University, Deep Network Cascade for Image Super-Resolution, Chen et al., ECCV 2014

compared to all SFSR(CNN-based or not)solutions

Page 42: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

From Sparse Coding to CNN-based SR

42

Neuron CNN

Scaling Super ResolutionSparse Coding

Super Resolution

CNN-based SRSparseCoding

Sparse CodingDeep Network

Page 43: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Sparse Coding

43

• Reconstruct input signal x using a linear combination of basis vectors of a Dictionary D with sparse coefficients

o x = D ⋅

• where x is an n x 1 input vector

D is an n x m matrix, an overcomplete (m > n) Dictionary with m basis vectors

is an m x 1 sparse code vector

• Sparse = Most of sparse code coefficients in are zero, i.e., is a sparse representation of x

• Optimal sparse code is obtained as = argminz E(x, z) = 1

2x− 𝐃𝐳 2

2 + 𝐳 1

Encoder• Dictionary D• ISTA/CoD (iterative)

• LSTA/LCoD (approximate)

Input Vector x Sparse Code

Page 44: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Sparse Coding Super-Resolution

44

Super-Resolution Reconstruction

• y = Dy ⋅ y y = x Dx ⋅ x = x

3x3 LR

Image Patch y

HR Sparse

Representation x

LR Sparse

Representation y

9x9 HR

Image Patch x

Joint DictionaryTraining:Iterative

Optimization using 100,000 random image

patch pairs

Overcomplete

LR Dictionary Dy(m = 1024)

Overcomplete

HR Dictionary Dx(m = 1024)

Linear Combination

Linear Combination

Dictionary Elements

Dictionary Elements

Sparse Code Encoder

Page 45: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

45

SCN (Sparse Coding based Network)

Sparse Coding Super-Resolution Deep Network Super-Resolution1. Layer #1 (Convolutional Layer H): image patch/feature y is extracted from the LR image Iy with my filters

2. Layer #2 and #3 (Sparse Code Encoder as k-iterations of LISTA network): Sparse code is computed from y

3. Layer #4 (Reconstruction): Sparse code is multiplied with HR Dictionary Dx to reconstruct HR image patch x

4. Layer #5 (Convolutional Layer G): All HR patches x are combined to HR Image Ix

Sparse Code Encoder

Iy LR Imagey LR Image Patch Sparse Codex HR Image PatchIx HR Image

Fig. 2 from “Robust Single Image Super-Resolution via Deep Networks with Sparse Prior”, IEEE Transactions on Image Processing, Vol. 25. Issue 7, pp 3194-3207, 2016

Page 46: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

46

SCN: 5-Layer Deep Network for Super-Resolution

Deep Network Architecture• 2 Convolutional Layers (H and G) and 3 Layers for Sparse Coding Encoder

• All parameters trained via back-propagation using MSE cost function

• Network learns more complex function beyond the sparse coding model

• Performs better than sparse coding results even with dictionary size reduced from 1024 to 128

Advantages of SCN• LISTA sub-network to enforce sparse representation, i.e., better interpretation of filter responses

and parameter initialization based on domain knowledge in sparse coding

• Better SR results, faster training speed and smaller model size

Subjective Quality Assessment• Best Visual Quality against other SFSR solutions (sharper boundaries, richer textures, no ringing)

• Scale ratio is fixed for the network Use a cascade of multiple SCNs + bicubic downscaler

• Cascade of multiple networks is better than a single network trained with a large scale factor

Page 47: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Quality Study via Simulation

47

Page 48: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

48

Table of Content

PSNRMSE

VisualInspection

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 49: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Capturing LR and HR Test Images

49

1. Camera Capture• LR: Camera Capture in FHD Mode at 1936x1288, then cropped to 720x480• HR: Camera Capture in UHD Mode at 3888x2592, then cropped to 1440x960

2. Optical Scanner• LR: Scan a letter-size printed document in 300dpi Mode at 2478x3228, then cropped to 720x480• HR: Scan the same printed document in 600dpi Mode at 4956x6456, then cropped to 1440x960

3. Screen Capture (www.intel.com)• LR: Screen Capture of Intel Website at 100% Zoom, then cropped to 720x480• HR: Screen Capture of the same Intel Website at 200% Zoom, then cropped to 1440x960

Test Image #1 Test Image #2 Test Image #3

Page 50: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

SR Test Scenarios

50

Scaling Solutions

• SFC AVS: Gen9 Intel® Processor Graphics Media HW FF SFC AVS in SW Simulation

• SCN: Sparse-Coding Network (SCN) is CNN-based SR from Huang et al.

MATLAB codes and network parameters available in http://www.ifp.illinois.edu/~dingliu2/iccv15/

2x Upscaling for 1920x1080 to 3840x2160• SFC AVS: 2x

• SCN: 2x

4x Upscaling for 1920x1080 to 7680x4320• SFC AVS: 4x

• SCN: 2x (SCN) 2x (SCN)

1.3x Upscaling for 1920x1080 to 2560x1440• SFC AVS: 1.3x

• SCN: 2x (SCN) 0.65x (MATLAB Bicubic)

Page 51: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

51

Page 52: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

52

SFC AVS

SCN

visual artifact

SCN result is sharper than AVS

SCN adds some visual artifacts

+1 to AVS or on Par

Page 53: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

53

SCN

SFC AVS

SCN has the halo problem that is more pronounced in 4x upscaling

+1 to AVS

halo added

Page 54: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

54

SCN

SFC AVS

ringing

severe color bleeding

SCN result is sharper, but with more visible ringing and color bleeding artifacts

+1 to AVS

Page 55: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

SR Test Results

55

Upscaling Ratio Test 1 Test 2 Test 3

1.3x SFC AVS SFC AVS SCN

2x SFC AVS SFC AVS SCN

4x SFC AVS / SCN SFC AVS SFC AVS

Overall• SFC AVS and SCN performed well against the ground truth and quite closely to each other in 3 test examples• SFC AVS seems to have a slight advantage over SCN on these 3 test examples

But, Why...?• SCN has not been trained on a wide range of non-natural scenes / computer graphics contents

• Test input images are high-quality LR images, but SCN is trained on very blurry LR input images (Gaussian Blurring + Downsample + Bicubic Upsample)

• Better understanding of CNN architecture, training database, and training strategies is required

Page 56: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Summary

56

• Gen9 Intel® Processor Graphics adds a new HW FF called SFC• SFC AVS provides a high-quality video scaling solution at low-power• Adaptive mode in AVS combines benefits of smooth and sharp

filters on a per-pixel basis for superior output quality

1 Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 57: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Summary

57

• Super-Resolution scaling solutions have been developed using CNN framework and presents a great potential for high quality video scaling

• Gen9 Intel® Processor Graphics adds a new HW FF called SFC• SFC AVS provides a high-quality video scaling solution at low-power• Adaptive mode in AVS combines benefits of smooth and sharp

filters on a per-pixel basis for superior output quality

2

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 58: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Summary

58

• Super-Resolution scaling solutions have been developed using CNN framework and presents a great potential for high quality video scaling

• SFC AVS produces very high quality output that is comparable to current state-of-the-art CNN-based SR solutions

• CNN-based SR scaling can be further improved with more intelligent training and architecture in the future

• Gen9 Intel® Processor Graphics adds a new HW FF called SFC• SFC AVS provides a high-quality video scaling solution at low-power• Adaptive mode in AVS combines benefits of smooth and sharp

filters on a per-pixel basis for superior output quality

3

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 59: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Summary

59

• Super-Resolution scaling solutions have been developed using CNN framework and presents a great potential for high quality Super-Resolution scaling

• SFC AVS produces very high quality output that is comparable to current state-of-the-art CNN-based SR solutions

• CNN-based SR scaling can be further improved with more intelligent training and architecture in the future

• Gen9 Intel® Processor Graphics adds a new HW FF called SFC• SFC AVS provides a high-quality video scaling solution at low-power• Adaptive mode in AVS combines benefits of smooth and sharp

filters on a per-pixel basis for superior output quality

• Use Gen9 Intel HW FF Scaler for Low-Power High-Performance High-Quality UHD 4K60 Scaling

• Use Gen9 Intel® Processor Graphics for CNN-based SR running on openCL for enhanced UHD picture quality

Gen9 Intel®Processor Graphics

Super-ResolutionScaling

SFC Media HW FFAdvanced Video

Scaler in SFC

Convolutional Neural Network

Super-Resolution Scaling using CNN

Compare

Page 60: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Q & A

60

Page 61: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

61

Acknowledgement

Many thanks go to the following individuals from Intel• Yi-jen Chiu

• Keith Rowe

• Niranjan S Mulay

• Ping Liu

• Furong Zhang

• Wen-fu Kao

• Vidhya Krishnan

• Sungye Kim

• Charles Lingle, Jon Kennedy and other tech reviewers

• Michaelle Gonzalez, Naomi Pitfield, and the SIGGRAPH Team

Page 62: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution

Legal Notices and DisclaimersIntel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance.

Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.

This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K.

All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. © 2016 Intel Corporation. Intel, the Intel logo, OpenCL and others are trademarks of Intel Corporation in the U.S. and/or other countries.

*Other names and brands may be claimed as the property of others.

Page 63: Ultra HD Video Scaling: Low-Power HW FF vs. CNN-based Super-Resolution