nvidia video technologiesdeveloper.download.nvidia.com/video/gputechconf/gtc/2020/... · 2020. 5....
TRANSCRIPT
Abhijit Patait, GTC 2020
NVIDIA VIDEO TECHNOLOGIES
2
Video/Image Processing on NVIDIA GPUs
NVENC, NVDEC, NvOFA, NvJPEG
Software Updates and New Features
Updates, benchmarks and end-to-end use-cases
Roadmap
Upcoming features
AGENDA
3
Video/Image Processing on NVIDIA GPUs
4
NVIDIA VIDEO/IMAGE HARDWARE
NVDEC
NVENC Optical Flow
JPEG decode
CUDA Cores
PCIe
CPU System Memory
Vid
eo M
em
ory
GPU
CUDA Cores
NVDEC – Video Decoder (H.264, H.265, VP9,
MPEG-2…)
NVENC – Video Encoder (H.264, H.265)
Optical Flow (Turing+) – Track pixels
JPEG Decoder (Ampere+ architecture)
5
SOFTWARE
NVENC NVDECOptical
FlowNvJPEG
NVIDIA Driver
Video Codec SDKOptical
Flow SDK
CUDA
Toolkit
Hardware
Software
All binaries in NVIDIA driver
SDKs
APIs
Reusable samples
Documentation
Linux & Windows
CUDA, DirectX, OpenGL, Vulkan
APIs
Binary backward compatibility
6
WHAT’S NEW IN GA100 GPUs?
NVDECs
Pascal: 1
Turing: 1-2
GA100: 5
… 1001010111010 …
… 0101100010011 …
… 1001010111010 …
… 0101100010011 …
Scale
Scale
Scale
ScaleHigh-resDecode
1080p, 720p
Infer
Infer
Infer
Infer
Low-res infere.g. 300 × 200
NVDEC 0
NVDEC N
.
.
.
… 1001010111010 …
… 0101100010011 …
… 1001010111010 …
… 0101100010011 …
.
.
.
.
.
.
.
.
.
.
.
.
Not all features will be in all GPUs. Please check NVIDIA developer zone web site for detailed information
7
WHAT’S NEW IN GA100 GPUs?
NVDECs
Pascal: 1
Turing: 1-2
GA100: 5
Dedicated 5-core JPEG decoder
… 0101100010011 …
… 1001010111010 …
… 0101100010011 …
Scale
Scale
Scale
Scale
Full-res decode
Infer
Infer
Infer
Infer
Low-res infere.g. 300 × 200
JPEG
decoder 1
JPEG
decoder 5
.
.
.… 0101100010011 …
.
.
.
.
.
.
… 1111100010011 …
Not all features will be in all GPUs. Please check NVIDIA developer zone web site for detailed information
8
WHAT’S NEW IN GA100 GPUs?
NVDECs
Pascal: 1
Turing: 1-2
GA100: 5
Dedicated 5-core JPEG decoder
Improved optical flow engine,
independent of NVENC
NVENC
Optical Flow
Turing Ampere
• Per-pixel flow vector
• Improved quality & perf
• 8192 × 8192
• Region of interest
Not all features will be in all GPUs. Please check NVIDIA developer zone web site for detailed information
9
SOFTWARE UPDATES
Video Codec SDK 10.0
Optical Flow SDK 2.0
NvJPEG decode (part of CUDA 11.0)
Upcoming SDK Releases
10
Video Codec SDK Update
11
VIDEO CODEC SDK
Q1 2018 Q1 2019
2017 Q3 2018 Q3 2019
SDK 8.14K60 HEVC encode
SDK 9.0Turing
HEVC B-frameMulti-NVDEC
Q2 2020
SDK 10.0GA100
NVENC presets 2.0
SDK 8.010-bit transcode
SDK 8.2Decode + inference
optimizations
SDK 9.1True CBR
CUDA/NVDEC parallelism
12
VIDEO SDK 10.0
GA100 Support (Ampere architecture)
Multi-NVDEC performance improvements
Encoder (NVENC) presets 2.0
June 2020
13
VIDEO SDK 10.0
GA100 Support (Ampere architecture)
Multi-NVDEC performance improvements
Encoder (NVENC) presets 2.0
14
83
168
144
129
34
71 68
49
1722 23 22
1621 22 23
12 15 14 16
0
20
40
60
80
100
120
140
160
180
H264 HEVC HEVC 10 VP9
1080p30 s
tream
s
GA100 (A100-42GB)
TU104 (Tesla T4)
GP104 (Tesla P4)
GV100 (Quadro GV100)
GM206 (Tesla M4)
TURING ARCHITECTUREUp to 2 NVDECs per GPU for Decode + Inference
15
83
168
144
129
34
71 68
49
1722 23 22
1621 22 23
12 15 14 16
0
20
40
60
80
100
120
140
160
180
H264 HEVC HEVC 10 VP9
1080p30 s
tream
s
GA100 (A100-42GB)
TU104 (Tesla T4)
GP104 (Tesla P4)
GV100 (Quadro GV100)
GM206 (Tesla M4)
GA100 WITH AMPERE ARCHITECTURE5 NVDECs per GPU for Decode + Inference
16
VIDEO SDK 10.0
GA100 (Ampere microarchitecture) Support
Multi-NVDEC
Encoder (NVENC) presets 2.0
17
NVENC CONFIGURATION
Choose from
6 Presets: HQ, DEFAULT, HP, LLHQ, LL_DEFAULT, LLHP
6 Rate control modes: Constant QP, VBR, CBR, CBR_LOWDELAY_HQ, CBR_HQ, VBR_HQ
Advanced features: B-frames, B-as-reference, Look-ahead, AQ, weighted prediction, VBV…
Use-case specific configuration
Streaming: LL* + CBR_LOWDELAY_HQ + 1-frame VBV
Transcoding: HQ + VBR/CBR + B-frames + B-as-reference + AQ + high VBV
Resolution-dependent quality/performance
Extreme ends of quality/performance difficult to achieve
Video Codec SDK 9.1 and earlier
18
INTRODUCING NVENC PRESETS 2.0Video Codec SDK 10.0
Choose
Use-case/Tuning Info: High-quality, Low Latency, Ultra Low Latency, Lossless
Preset: P1 (highest performance) to P7 (highest quality)
Rate Control Mode: Constant QP, CBR, VBR
Advanced Features: Built-in; tune only if required
19
LEGACY VS NEW NVENC PRESETS
Legacy Presets Presets 2.0
One-shot configuration No Yes
Advanced features config Needed Rarely needed
Use-case based No Yes
Performance scales with resolution Sometimes Always
Multi-pass rate control Indirect Direct
Fine granularity of perf vs quality Low High
Rate control modes Complex Easy to understand
Extreme ends of quality/perf trade-off No Yes
Comparison
20
NVENC PRESETS: NEW VS LEGACYHEVC, High quality (latency tolerant), VBR
New NVENC Presets Legacy NVENC Presets
649
424380
297
230
132 119
651
230
132
0.00%
8.80%14.29%
23.57% 25.86% 27.45% 27.71%
0.00%
25.86%
27.45%
-100%
-80%
-60%
-40%
-20%
0%
20%
40%
0
200
400
600
800
1000
1200
P1 P2 P3 P4 P5 P6 P7 HP Default HQ
Quality
(Bit
rate
savin
gs)
Perf
orm
ance (
fps)
at
1080p
21
NVENC PRESETS: NEW VS LEGACYH.264, High quality (latency tolerant), VBR
New NVENC Presets Legacy NVENC Presets
634606
390
304
167 162 140
611
406
304
0.00% 0.16%
9.34%
20.60% 21.85%23.78% 24.08%
0.16%
16.36%
20.60%
-60%
-50%
-40%
-30%
-20%
-10%
0%
10%
20%
30%
0
200
400
600
800
1000
1200
P1 P2 P3 P4 P5 P6 P7 HP Default HQ
Bit
rate
savin
gs
Perf
orm
ance (
fps)
at
1080p
22
NVENC PRESETS: NEW VS LEGACYHEVC, Ultra Low Latency (latency sensitive), CBR
Legacy NVENC PresetsNew NVENC Presets
327
248219
188165 165 164
340
289
188
0.00%
1.93%
4.22%
5.66%
5.96%
5.96%
6.07%
-1.46%
1.00%
5.66%
-12%
-10%
-8%
-6%
-4%
-2%
0%
2%
4%
6%
8%
0
100
200
300
400
500
600
700
800
P1 P2 P3 P4 P5 P6 P7 LL-HP LL-Default LL-HQ
Bit
rate
savin
gs
Perf
orm
ance (
fps)
at
1080p
23
MIGRATION TO NEW NVENC PRESETS
Strongly recommended to move to new presets
Mapping table in Video Codec SDK 10.0
Backward compatibility – Binary ; Source
Old presets will be removed from API in future SDK
SDK release – June 2020
Video Codec SDK 10.0
24
Encode (NVENC) Benchmarks
25
ENCODE BENCHMARKShttps://developer.nvidia.com/nvidia-video-codec-sdk
26
19
10
19 18
6 52
-7.80%
1.62%
-11.15%-9.87%
-4.48%
0%
8.59%
-30%
-25%
-20%
-15%
-10%
-5%
0%
5%
10%
15%
0
10
20
30
40
50
60
70
T4 fast T4 medium P4 medium P4 slow x264 fast x264 medium x264 slow
Bit
rate
Savin
g (
hig
her
is b
ett
er)
#1080p30 s
tream
s
ENCODE BENCHMARK – H.264Latency Tolerant H.264 Encoding vs x264
Turing NVENC Pascal NVENC CPU
Reference quality
27
ENCODE BENCHMARK - HEVCLatency tolerant HEVC Encoding vs x265
13
4
1210
2 1 0.85
-21%
-10%
-35%-35%
-8%
0%
9%
-100%
-80%
-60%
-40%
-20%
0%
20%
0
10
20
30
40
50
60
T4 fast T4 medium P4 medium P4 slow x265 fast x265 medium x265 slow
Bit
rate
Savin
g (
hig
her
is b
ett
er)
#1080p30 s
tream
s
Reference quality
Turing NVENC Pascal NVENC CPU
28
Optical Flow SDK Update
29
OPTICAL FLOW SDK
Q1 2019
Q3 2019
SDK 1.0Turing
4x4 vectors, 4K
Q2 2020
SDK 2.0GA100
1x1, 2x2, 4x4, 8KRegion of Interest
Object tracker
SDK 1.1OpenCVAccuracy
30
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
What’s New?
NVENC
Optical Flow
Turing Ampere
31
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
Better performance and higher accuracy than Turing
Up to 300 fps at 4K*
What’s New?
72%
73%
74%
75%
76%
77%
78%
79%
80%
81%
82%
83%
0 200 400 600 800 1000 1200 1400
Quality
* (%
accura
te ±
3 p
ixels
)
Performance (fps for 1080p)
Ampere/Turing
Slow
Medium
Fast
*Performance is dependent on clocks, available memory bandwidth and well-designed application pipelining
32
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
Better performance and higher accuracy than Turing
Up to 300 fps at 4K*
Granularity: 1×1, 2×2, 4×4 pixels
What’s New?
4 × 4
2 × 2
1 × 1
*Performance is dependent on clocks, available memory bandwidth and well-designed application pipelining
33
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
Better performance and higher accuracy than Turing
Up to 300 fps at 4K*
Granularity: 1×1, 2×2, 4×4 pixels
Resolution up to 8192 × 8192
What’s New?
Turing OF
4096
4096
Ampere OF
8192
8192
*Performance is dependent on clocks, available memory bandwidth and well-designed application pipelining
34
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
Better performance and higher accuracy than Turing
Up to 300 fps at 4K*
Granularity: 1×1, 2×2, 4×4 pixels
Resolution up to 8192 × 8192
Flow vectors for region of Interest (ROI)
What’s New?
*Performance is dependent on clocks, available memory bandwidth and well-designed application pipelining
35
OPTICAL FLOW SDK 2.0
Ampere architecture GPUs
Improved hardware, independent of NVENC
Better performance and higher accuracy than Turing
Up to 300 fps at 4K*
Granularity: 1×1, 2×2, 4×4 pixels
Resolution up to 8192 × 8192
Flow vectors for region of Interest (ROI)
¼ pixel accuracy
Improved confidence (cost)
Optical-Flow based object tracker
What’s New?
*Performance is dependent on clocks, available memory bandwidth and well-designed application pipelining
36
OPTICAL FLOW SDK 2.0Accuracy vs Performance
72%
73%
74%
75%
76%
77%
78%
79%
80%
81%
82%
83%
0 200 400 600 800 1000 1200 1400
Accura
cy*
(% a
ccura
te ±
3 p
ixels
)
Performance (fps for 1080p)
Ampere/Turing
Slow
Medium
Fast
*KITTI 2015 FL-ALL NOC
37
OPTICAL FLOW REGION OF INTEREST
Static background, moving foreground objects; e.g. video surveillance
Identify object of interest
Define ROI with extended bounds ±N pixels
Advantages: Improved performance, less noise in flow vectors
38
Main Functionality
nvOpticalFlowCommon.h
CUDA and DirectX Buffer Management
nvOpticalFlowCuda.h
nvOpticalFlowD3D11.h
Reusable Classes
NvOF.h
NvOFCuda.h
NvOFD3D11.h
NV_OF_STATUS(NVOFAPI* PFNNVOFINIT)
(NvOFHandle hOf, const NV_OF_INIT_PARAMS
*initParams);
NV_OF_STATUS(NVOFAPI* PFNNVOFEXECUTE)
(NvOFHandle hOf, const
NV_OF_EXECUTE_INPUT_PARAMS
*executeInParams,
NV_OF_EXECUTE_OUTPUT_PARAMS
*executeOutParams);
NV_OF_STATUS(NVOFAPI* PFNNVOFDESTROY)
(NvOFHandle hOf);
39
USE VIA OPENCV
Mat frameL = imread(pathL, IMREAD_GRAYSCALE);
Mat frameR = imread(pathR, IMREAD_GRAYSCALE);
GpuMat d_flowL(frameL), d_flowR(frameR), d_flow;
Mat flowx, flowy, flowxy;
int gpuId = 0;
int width = frameL.size().width, height = frameL.size().height;
Ptr<cuda::FarnebackOpticalFlow> OpticalFlow =
cuda::FarnebackOpticalFlow::create();
OpticalFlow->calc(d_flowL, d_flowR, d_flow);
d_flow.download(flowxy);
40
USE VIA OPENCV
Mat frameL = imread(pathL, IMREAD_GRAYSCALE);
Mat frameR = imread(pathR, IMREAD_GRAYSCALE);
GpuMat d_flowL(frameL), d_flowR(frameR), d_flow;
Mat flowx, flowy, flowxy;
int gpuId = 0;
int width = frameL.size().width, height = frameL.size().height;
Ptr<cuda::NvidiaOpticalFlow> OpticalFlow =
cuda::NvidiaOpticalFlow::create(perfPreset, width, height, gpuId);
OpticalFlow->calc(frameL, frameR, d_flow);
d_flow.download(flowxy);
41
OPTICAL FLOW APPLICATIONS
Object tracking
Video frame interpolation
Video frame extrapolation
Action recognition
42
OPTICAL FLOW APPLICATIONS
Object tracking
Video frame interpolation
Video frame extrapolation
Action recognition
- API and source code in Optical Flow SDK 2.0
43
OBJECT TRACKINGUsing Feature Tracker or Correlation Filter
.
.
.
NVDEC 1
NVDEC 2
NVDEC N
.
.
.
Detector M
Camera 1
Camera M
Stream 1
Stream M
Detector 1
Object
Tracker 2
Object
Tracker M
.
.
.
Object
Tracker 1
Feature Tracker
.
.
.
.
.
.
1
M
M
1
.
.
.
Runs
every
Kth
fram
e
.
.
.
Object classes, IDs and positions
Runs
every
fra
me
44
OBJECT TRACKING WITH ZERO GPU USAGEOptical Flow is “Free”!!
.
.
.
NVDEC 1
NVDEC 2
NVDEC N
.
.
.
Detector M
Camera 1
Camera M
Stream 1
Stream M
Detector 1
Object
Tracker 2
Object
Tracker M
.
.
.
Object
Tracker 1
NVIDIA Optical
Flow Hardware
.
.
.
.
.
.
1
M
M
1
.
.
.
Runs
every
Kth
fram
e
.
.
.
Object classes, IDs and positions
Runs
every
fra
me
45
OBJECT TRACKER ALGORITHMNvOFTracker in Optical Flow SDK 2.0
Decode
(NVDEC)
Detect
(yolov3)
Frames
Tracker
(CPU/
CUDA)
Regions of Interest
NvOF
Hardware
FlowVectors
Object IDs
Object Positions
Object Classes
Decode video bitstream to frames, P-2, P-1, P (current)
Detect objects of interest (person, bicycles, cars, …)
Define regions of interest to track
Runs every Kth frame
Optical Flow (P, P-1)
Dense flow field ➔ Representative flow
Warp object position
Tracker minimizes cost
Cost = f(Centroid distance, IOU, OF cost, …)
46
NvOFTracker.h
C-API
• NvOFTCreate
• NvOFTProcess
• NvOFTDestroy
COFTracker.h
C++ Reusable Class
• COFTracker::COFTracker
• COFTracker::InitTracker
• COFTracker:: TrackObjects
NV OBJECT TRACKER APISource code in Optical Flow SDK 2.0
47
Sample Code
int main(int argc, char** argv) {
// Instantiate a video decoder
cv::VideoCapture videoIn(clFields.inputFile, cv::CAP_FFMPEG);
// Initialize and instantiate object detector
CDetector objectDetector(detectorEngineFile, gpuId);
// Instantiate NvOFT Object tracker
COFTracker objectTracker(w, dh, NvOFT_SURFACE_MEM_TYPE_SYSTEM,
NvOFT_SURFACE_FORMAT_Y, gpuId);
while (frame_available) {
// Read the next video frame
cv::Mat frame; videoIn >> frame;
if (nFrame % N == 0)
// Run detector every Nth frame and get bounding boxes
boxes = objectDetector.Run(frame.data, frameProperties);
else
// Track detected objects in every frame
trackedObjects = objectTracker.TrackObjects(frame.data,
frameSize, frame.step[0], inputObjectsVector, FALSE);
DrawTrackedObjects(…)
}
videoIn.release();
}
48
Contents
OBJECT TRACKER IN OPTICAL FLOW SDK
Source code for end-to-end pipeline with
Video decoder
Object detector (non-NVIDIA, yoloV3)
Object tracker based on NVIDIA optical flow
API for easy integration
Sample application and documentation
Integrable with NVIDIA DeepStream SDK
49
OBJECT TRACKER PROFILING
50
Roadmap
51
Roadmap
Video Codec SDK 11.0 (Q3 2020)
Deprecation of presets
Samples moved to GitHub, online documentation
DirectX 12 and Vulkan support via sample apps
Optical Flow SDK 3.0 (Q3 2020)
Ampere architecture GPUs
Object detector & tracker
Frame rate interpolation/extrapolation
Optical vector pre/post-processing APIs
52
RESOURCES
Video Codec SDK: https://developer.nvidia.com/nvidia-video-codec-sdk
Optical Flow SDK: https://developer.nvidia.com/nvidia-video-codec-sdk
Video Technologies Developer Forum: https://devtalk.nvidia.com/default/board/175/video-technologies/
Support: [email protected]
CWE21120: How to Use Video Codec and Optical Flow SDK on NVIDIA GPUs Effectively