nvidia video technologies · video transcoding video editing. 6 nvidia video technologies e e video...

38
Abhijit Patait, 3/20/2019 NVIDIA VIDEO TECHNOLOGIES

Upload: trinhnguyet

Post on 20-Aug-2019

235 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

Abhijit Patait, 3/20/2019

NVIDIA VIDEO TECHNOLOGIES

Page 2: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

2

AGENDA

NVIDIA Video Technologies Overview

Turing Video Enhancements

Video Codec SDK Updates

Benchmarks

Roadmap

Page 3: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

3

NVIDIA VIDEO TECHNOLOGIES

Page 4: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

4

CPU

NVDEC NVENC

CUDA Cores

Buffer

Decode HW* Encode HW*

Formats:

• H.264

• H.265

• Lossless

Bit depth:

• 8 bit

• 10 bit

Color**

• YUV 4:4:4

• YUV 4:2:0

Resolution

• Up to 8K***

Formats:

• MPEG-2

• VC1

• VP8

• VP9

• H.264

• H.265

• Lossless

Bit depth:

• 8/10/12 bit

Color**

• YUV 4:2:0

• YUV 4:4:4

Resolution

• Up to 8K***

NVIDIA GPU VIDEO CAPABILITIES

* See support diagram for previous NVIDIA HW generations

** 4:4:4 is supported only on HEVC for Turing; 4:2:2 is not natively supported on HW

*** Support is codec dependent

Page 5: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

5

VIDEO CODEC SDKA comprehensive set of APIs for GPU-accelerated video encode and decode

NVENCODE API for video encode acceleration

NVDECODE API for video & JPEG decode acceleration (formerly called NVCUVID API)

Independent of CUDA/3D cores on GPU for pre-/post-processing

Video archiving

Intelligent video analytics

Remote desktop streaming

Gamestream

Video transcoding

Video editing

Page 6: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

6

NVIDIA VIDEO TECHNOLOGIESSO

FTW

ARE

HARD

WARE

Video Encode and Decode for Windows and LinuxCUDA, DirectX, OpenGL interoperability

VIDEO CODEC, OPTICAL FLOW SDK

Video decode

NVDEC

NVIDIA DRIVER

NVENC Video encode

CUDA TOOLKIT

Easy access to GPU video acceleration

APIs, libraries, tools, samples

DeepStreamSDK

cuDNN, TensorRT, cuBLAS, cuSPARSE

CUDA High-performance computing on GPU

DALI

Page 7: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

7

VIDEO CODEC SDK UPDATE

Page 8: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

8

VIDEO CODEC SDK UPDATE

2016

SDK 7.xPascal

10-bit encodeFFmpeg

ME-only for VRQuality++

2017

SDK 8.010-bit transcode10/12-bit decode

OpenGLDec. optimizations

WP, AQ, Enc. Quality

Q1 2018

SDK 8.1B-as-ref

QP/emphasis map4K60 HEVC encodeReusable classes & new sample apps

Q3 2018

SDK 8.2Decode + inference

optimizations

SDK 9.0Turing

Multi-NVDECHEVC 4:4:4 decodeEncode quality++HEVC B frames

2019

Page 9: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

9

VIDEO CODEC SDK 9.0

Feature Who it benefits

Higher video encode quality

HEVC B-frames

Higher encode quality

Cloud gaming

Game broadcasting (e.g. Twitch)

Video transcoding (e.g. Youtube, Facebook)

OTT/M&E

HEVC 4:4:4 decode End-to-end high-quality remote desktop

Mutiple NVDECs Higher decode + inference throughput

Direct output to vidmem Higher perf with post-processing

Power 9 + Tesla V100 SXM2 Video SDK for IBM platforms

Soul

Page 10: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

10

TURING UPDATES - NVDEC

Page 11: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

11

MULTIPLE NVDECS IN TURING

GPU Number of NVDECs per GPU

Volta, Pascal & earlier 1

Turing – GeForce (RTX) 1

Turing – Quadro & Tesla (TU106) 3

Turing – Quadro & Tesla (TU104) 2

Turing – others 1

➢ Quadro & Tesla feature

➢ Auto-load-balanced by driver

Page 12: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

12

PASCAL & EARLIERSingle NVDEC

NVDEC… 1001010111010 …

… 0101100010011 …

… 1001010111010 …

… 0101100010011 …

Scale

Scale

Scale

Scale

High-resDecode

1080p, 720p

Infer

Infer

Infer

Infer

Low-res infere.g. 300 × 200

Bottleneck

Page 13: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

13

TURINGMultiple NVDECs

… 1001010111010 …

… 0101100010011 …

… 1001010111010 …

… 0101100010011 …

Scale

Scale

Scale

ScaleHigh-resDecode

1080p, 720p

Infer

Infer

Infer

Infer

Low-res infere.g. 300 × 200

NVDEC 0

NVDEC N

.

.

.

… 1001010111010 …

… 0101100010011 …

… 1001010111010 …

… 0101100010011 …

.

.

.

.

.

.

.

.

.

.

.

.

Page 14: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

14

END-TO-END 4:4:4 IN TURING

➢ Preserves chroma: text and thin lines

➢ Valuable in desktop streaming

4:2:0 4:4:4

Page 15: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

15

END-TO-END 4:4:4 IN TURINGHEVC 4:4:4 HW encode & 4:4:4 HW decode

Desktop

Capture

HW

EncodeStream

CPU

decodeRenderNetwork

Pascal & earlierTuring

HW

decode

Page 16: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

16

TURING NVENC ENHANCEMENTS

Page 17: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

17

NVENC - ENCODING QUALITYFocus for Turing NVENC

Enhancement How to use

Rate distortion optimization – RDO Turing only – always ON

Multiple reference frames Preset-dependent

HEVC B-frames NVENCODE API

Others

➢ Higher throughput at same quality as Pascal

➢ Turing GPUs have single NVENC engine with higher quality

Page 18: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

18

TURING NVENC QUALITY

➢ Focus on quality – RDO, multi-ref, HEVC B-frames, …

➢ Quality vs performance trade-off

➢ Quality is content dependent

➢ 600+ videos of 10-20 secs each: Natural, animation, gaming, video conference, movies

➢ 720p, 1080p, 4K, 8K

➢ Quality: PSNR, SSIM, VMAF, subjective

➢ Perf: fps, number of 1080p streams per GPU

Page 19: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

19

H.264 ENCODE BENCHMARKNon latency critical – Turing vs Pascal vs x264

“iso” quality = x264 medium

0.98

1.08

1.161.17

0.93

1.00

1.05

0.80

0.85

0.90

0.95

1.00

1.05

1.10

1.15

1.20

0 5 10 15 20

bit

rate

rat

io @

iso

qu

alit

y

#1080p30 streams

H.264 - non latency critical

T4 medium T4 fast P4 slow P4 medium x264 slow x264 medium x264 fast

10.60

19.41

17.7318.73

2.95

5.726.28

0

5

10

15

20

25

T4 medium T4 fast P4 slow P4 medium x264 slow x264 medium x264 fast

#108

0p30

str

eam

s

H.264 - non latency critical

Hig

her

bit

rate

savin

gs

Higher perf

Page 20: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

20

H.264 ENCODE BENCHMARK

NVENC slow -preset slow -bufsize BITRATE*2 -maxrate BITRATE*1.5 -profile:v high -bf 3 -

b_ref_mode 2 -temporal-aq 1 -rc-lookahead 20 -vsync 0

x264 slow -preset slow -tune psnr -vsync 0 -threads 4 -vsync 0

NVENC medium -preset medium -rc vbr -profile:v high -bf 3 -b_ref_mode 2 -temporal-aq 1

-rc-lookahead 20 -vsync 0

x264 medium -preset medium -tune psnr -threads 4 -vsync 0

NVENC fast -preset fast -rc vbr -profile:v high -bf 3 -b_ref_mode 2 -temporal-aq 1

-rc-lookahead 20 -vsync 0

x264 fast -preset fast -tune psnr -vsync 0 -threads 4 -vsync 0

Non latency critical – FFmpeg commands

Page 21: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

21

HEVC ENCODE BENCHMARKNon latency critical – Turing vs Pascal vs x265

1.10

1.211.35

0.921.00

1.10

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

0 5 10 15

bit

rate

rat

io @

iso

qu

alit

y

#1080p30 streams

HEVC – non latency critcal

T4 medium T4 fast P4 medium x265 slow x265 medium x265 fast

11.80

4.29

10.71

2.98

1.91

0.85

0

2

4

6

8

10

12

14

T4 fast T4 medium P4 medium x265 fast x265 medium x265 slow

#108

0p30

str

eam

s

HEVC – non latency critical

“iso” quality = x265 medium

Hig

her

bit

rate

savin

gs

Higher perf

Page 22: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

22

HEVC ENCODE BENCHMARKNon latency critical – FFmpeg commands

NVENC slow -preset slow -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -rc-lookahead 20 -g

250 -vsync 0

x265 slow -preset slow -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0

NVENC medium -preset medium -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -rc-lookahead 20

-g 250 -vsync 0

x265 medium -preset medium -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0

NVENC fast -preset fast -rc vbr_hq -b:v BITRATE -profile:v 4 -bf 2 -temporal-aq 1 -rc-

lookahead 20 -g 250 -vsync 0

x265 fast -preset fast -b:v BITRATE -bf 2 -tune psnr -threads 4 -vsync 0

Page 23: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

23

SOFTWARE UPDATES

Page 24: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

24

RECONFIGURE DECODER

✓ Input resolution

✓ Scaling resolution

✓ Cropping rectangle

Codecs

Bit-depth and chroma format

Deinterlace mode

Input resolution beyond max width or max height

Video Codec SDK 8.2

No init time, reuse context, lowers memory fragmentation

Page 25: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

25

DIRECT OUTPUT TO VIDMEM

SDK 8.2 & earlier

Video Codec SDK 9.0

CUDA

pre-processNVENC

CUDA

Post-process

CPU

process

PCIe

SDK 9.0Host/system

memory

Video memoryVideo memory

Page 26: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

26

OTHER UPDATES

➢ Video Codec SDK now supported on Power 9 + Tesla V100 SXM2

➢ High-level NVDEC error status

Page 27: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

27

OPTICAL FLOWNew HW Functionality

➢ 4 × 4 optical flow vector, up to 4K × 4K

➢ Close to true motion

➢ Robust to intensity changes

➢ 10x faster than CPU; same quality

➢ New Optical Flow SDK

➢ Action recognition, object tracking, video

inter/extrapolation, frame-rate upconversion

➢ Legacy ME-only mode support

More information: http://developer.nvidia.com/opticalflow-sdk

Page 28: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

28

TIPS FOR NVENC OPTIMIZATION

Page 29: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

30

OPTIMIZATION STRATEGIES

➢ Minimize PCIe transfers

➢ Eliminate, if possible

➢ Use CUDA for video pre-/post-processing

➢ Multiple threads/processes to balance enc/dec utilization

➢ Monitor using nvidia-smi: nvidia-smi dmon -s uc -i <GPU_index>

➢ Analyze using GPUView on Windows

➢ Minimize disk I/O

➢ Optimize encoder settings for quality/perf balance

General Guidelines

Page 30: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

31

FFMPEG VIDEO TRANSCODING

➢ Look at FFmpeg users’ guide in NVIDIA Video Codec SDK package

➢ Use –hwaccel keyword to keep entire transcode pipeline on GPU

➢ Run multiple 1:N transcode sessions to achieve M:N transcode at high perf

Tips

Page 31: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

32

LOW LATENCY STREAMING (1/3)

➢ Low latency ≠ Low encoding time

➢ Latency determined by

➢ B-frames

➢ Look-ahead

➢ VBV buffer size & avlbl bandwidth

Optimization tips

Page 32: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

33

LOW LATENCY STREAMING (2/3)

➢ For 1-2 frame latency (e.g. cloud gaming), use

➢ RC_CBR_LOWDELAY_HQ & Low VBV buffer size

➢ Minimizes frame-to-frame variations

➢ Any preset (Default, HQ, HP preferred)

➢ LL presets have resolution-dependent behavior

➢ No look-ahead

➢ No B-frames

Optimization tips

Page 33: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

34

LOW LATENCY STREAMING (3/3)

➢ Similar to HQ (non latency critical) encoding

➢ For higher (8-10 frames) latency (e.g. OTT, broadcast), use

➢ Any RC mode

➢ Any preset (default, HQ, HP preferred)

➢ VBV buffer size as per channel bandwidth constraints

➢ Look-ahead depth < tolerable latency

➢ B-frames as needed

Optimization tips

Page 34: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

35

VIDEO DL TRAININGTypical Workflow

Loader

Augm

ent

Color space

Resize

Crop

Reorder

Training

Decoded frames

NVDEC

With DALI

Instantiate operator:

self.input = ops.VideoReader(device="gpu", filenames=data,

sequence_length=len)

Use it in the DALI graph:

frames = self.input(name="Reader")

output_frames = self.Crop(frames)

return output_frames

➢ FFmpeg

➢ NVDECODE API

➢ CUDA pst-processing

Page 35: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

36

ROADMAP

Page 36: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

37

ROADMAP

➢ Q3 2018

➢ Error handling – Retrieve last error

➢ Perf/quality tuning

➢ Support for CUStream

Video Codec SDK 9.1

Page 37: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability

38

RESOURCES

Video Codec SDK: https://developer.nvidia.com/nvidia-video-codec-sdk

FFmpeg GIT: https://git.ffmpeg.org/ffmpeg.git

FFmpeg builds with hardware acceleration: http://ffmpeg.zeranoe.com/builds/

Video SDK support: [email protected]

Video SDK forums: https://devtalk.nvidia.com/default/board/175/video-technologies/

Connect with Experts (CE9103): Wednesday, March 20, 2019, 3:00 pm

Page 38: NVIDIA VIDEO TECHNOLOGIES · Video transcoding Video editing. 6 NVIDIA VIDEO TECHNOLOGIES E E Video Encode and Decode for Windows and Linux CUDA, DirectX, OpenGL interoperability