53021 deployment of semantic segmentation network … · semantic segmentation for automotive use...

45
Joohoon Lee Chethan Ningaraju 53021 – DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK USING TENSORRT

Upload: others

Post on 04-Aug-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

Joohoon Lee

Chethan Ningaraju

53021 – DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK USING TENSORRT

Page 2: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

2

OUTLINE

Semantic segmentation for automotive use case

Cityscapes dataset

Pre-trained sample network – FCN variant

Inference performance on DrivePX2 using Caffe and CUDNN

Introduction to TensorRT

FP32 Deployment using TensorRT

INT8 Deployment using TensorRT

Basic background information and hands-on sessions

Page 3: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

3

“Semantic segmentation is the

task of clustering parts of

images together which belong

to the same object class”

Martin Thoma - A Survey of Semantic Segmentation

Page 4: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

4

SEGMENTATION FOR AUTOMOTIVE USE CASE OpenRoadNet from NVIDIA

Page 5: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

5

SEGMENTATION FOR AUTOMOTIVE USE CASE OpenRoadNet from NVIDIA

Page 6: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

6

CITYSCAPES DATASET

https://www.cityscapes-dataset.com/

M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016

Page 7: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

7

19 CLASS

CITYSCAPES DATASET

road

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorcycle

bicycle

sidewalk

Page 8: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

8

19 CLASS

CITYSCAPES DATASET

road

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorcycle

bicycle

sidewalk

Page 9: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

9

PER PIXEL METRIC

7 CATEGORY19 CLASS

EVALUATION METRIC

road

building

wall

fence

pole

traffic light

traffic sign

vegetation

terrain

sky

person

rider

car

truck

bus

train

motorcycle

bicycle

sidewalk

TP = True Positive

FP = False Positive

FN = False Negative

Average IoU class

Average IoU category

flat

nature

object

sky

construction

human

vehicle

Page 10: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

10

PRETRAINED SAMPLE NETWORKFCN Variant

VGG16 based FCN with modification

Trained using Cityscapes train dataset

60000 iterations starting from VGG weights

Average IoU class = 48.4

Average IoU category = 76.9

Page 11: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

11

INFERENCE PERFORMANCE USING CAFFEPerformance measured using Caffe on DrivePX2 dGPU

CAFFE

Runtime (ms) 242.2

Images/sec 4.1

0

5

10

15

20

25

30

Caffe

Images/

sec

Caffe

Batch Size = 1, Input/Output Resolution = 512 x 1024

Page 12: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

12

TensorRTHigh performance neural network inference engine for production deployment

Generate optimized and deployment ready models for datacenter, embedded and automotive platforms

Deliver high-performance, low-latency inference demanded by real-time services

Deploy faster, more responsive and memory efficient deep learning applications with INT8 and FP16 optimized precision support

developer.nvidia.com/tensorrt

TensorRT for Data Center

Image Classification

Object

Detection

Image

Segmentation

TensorRT for Automotive

PedestrianDetection

Lane

Tracking

Traffic Sign

Recognition

NVIDIA DRIVE PX 2

Page 13: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

13

TensorRTStep 1: Optimize trained model

Training FrameworkTensorRTOptimizer

ValidationUSING TensorRT

PLANNEURALNETWORK

developer.nvidia.com/tensorrt

Serialize to disk

Batch Size

Precision

Page 14: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

14

TensorRTStep 2: Deploy optimized plans with runtime

TensorRTruntime engine

Serialized PLAN

developer.nvidia.com/tensorrt

Page 15: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

15

FP32 DEPLOYMENT USING TensorRT

Page 16: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

16

OUTLINE – FP32

Use Caffe parser to load a pre-trained model

Create TensorRT engine for FP32

Serialize engine to plan file

Measure performance of inferencing using TensorRT with FP32

Test inference output and visually inspect

What you will implement today

Page 17: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

17

GETTING STARTED

Lab files are located under /home/nvidia/GTC2017-53021

Recommended text editor

Basic information

$ cd /home/nvidia/GTC2017-53021

You need to do something

For your reference

Expected output on console

$ gedit <filename> &

Page 18: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

18

DIRECTORY

data

Pre-trained Caffe model

sampleCityscapes

Step 1: Optimize trainedmodel

sampleCityscapesInference

Step 2: Deploy optimized plans with runtime

Files for the lab

Page 19: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

19

LET’S WORK ON THE CODE TOGETHER

Page 20: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

20

USE CAFFE PARSER TO LOAD A MODEL

TODO #1 : Create a Caffe parser object by calling createCaffeParser() function

~Line 231

sampleCityscapes.cpp - TODO #1

IBuilder* builder = createInferBuilder(gLogger);

INetworkDefinition* network = builder->createNetwork();

ICaffeParser* parser = /* TODO */

NvCaffeParser.h

{

ICaffeParser* createCaffeParser();

}

Page 21: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

21

CREATE TensorRT ENGINE FOR FP32

TODO #2 : Create optimized TensorRT engine by calling buildCudaEngine on the builder object

~Line 271

sampleCityscapes.cpp - TODO #2

ICudaEngine* engine = /* TODO */

NvInfer.h

class Ibuilder {

virtual nvinfer1::ICudaEngine* buildCudaEngine(nvinfer1::INetworkDefinition&

network) = 0;

}

Page 22: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

22

SERIALIZE ENGINE TO PLAN FILE

TODO #3 : Serialize the engine to a plan file and save by calling serialize() on the engine

~Line 279

sampleCityscapes.cpp - TODO #3

tensorRTModelStream = /* TODO */

NvInfer.h

class ICudaEngine{

virtual IHostMemory* serialize() const = 0;

}

Page 23: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

23

BUILD & TEST YOUR CODE

In the GTC2017-53021/sampleCityscapes/ directory, build the sample

Run the sample_cityscapes program and check the output file

Output is optimized, serialized engine file

$ cd /home/nvidia/GTC2017-L53021/sampleCityscapes

$ make

$ cd /home/nvidia/GTC2017-L53021/bin

$ ./sample_cityscapes

$ ls –alsh ../output

Page 24: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

24

MEASURE INFERENCE PERFORMANCE IN FP32

TODO #4 : Timing routine has been written in comment.

Please uncomment.

~Line 331

sampleCityscapes.cpp - TODO #4

//std::cout << “Avg execution time over “ << TIMING_ITERATIONS << “ iteration

is “ << total/TIMING_ITERATIONS << “ ms.” << std::endl;

Page 25: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

25

BUILD & TEST YOUR CODE

In the GTC2017-53021/sampleCityscapes/ directory, build the sample

Run the sample_cityscapes program to measure average execution time

Measure average execution time

$ cd /home/nvidia/GTC2017-53021/sampleCityscapes

$ make

$ cd /home/nvidia/GTC2017-53021/bin

$ ./sample_cityscapes

Avg execution time over 10 iterations is 170.756 ms.

Page 26: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

26

MEASURE PERFORMANCE USING PROFILER

TODO #5 : Set profiler for the context, to get per layer performance

~Line 365~366

TODO #6 : Call printLayerTimes() at the end

~Line 378~379

sampleCityscapes.cpp - TODO #5

IExecutionContext *context = engine->createExecutionContext();

context->setProfiler(&gProfiler);

gProfiler.printLayerTimes();

Page 27: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

27

BUILD & TEST YOUR CODE

Build and run sample_cityscapes again

Measure per layer execution time using profiler

conv1_1 + relu1_1 2.447ms

conv1_2 + relu1_2 11.816ms

pool1 2.625ms

conv2_1 + relu2_1 6.054ms

conv2_2 + relu2_2 11.784ms

...

...

upscore_pool4 0.082ms

score_pool3 + fuse_pool3 0.180ms

upscore8 2.485ms

Time over all layers: 170.756

Page 28: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

28

PREPARE A TEST IMAGE

In the GTC2017-53021/scripts/ directory, run script

scripts/batch_preprocessor.py

$ cd /home/nvidia/GTC2017-53021/scripts/

$ python batch_preprocessor.py test

Location of dataset = /home/nvidia/GTC2017-53021-

Data/Cityscapes/leftImg8bit/train/*/*.png

Processing batches for test

Total number of images = 2975

NUM_PER_BATCH = 1

NUM_BATCHES = 1

Adding image: aachen_000000_000019_leftImg8bit.png in batch_test0

Page 29: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

29

GENERATE PREDICTION OUTPUT

In the GTC2017-53021/sampleCityscapesInference/ directory, build the sample

Run the sample_cityscapes_inference program

sampleCityscapesInference

$ cd /home/nvidia/GTC2017-53021/sampleCityscapesInference/

$ make

$ cd /home/nvidia/GTC2017-53021/bin

$ ./sample_cityscapes_inference test

Saving output prediction to ../output/aachen_000000_000019_leftImg8bit_pred.png

Page 30: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

30

VISUALLY INSPECT THE PREDICTION

Open the prediction file using ‘display’ command

display

$ display ../output/aachen_000000_000019_leftImg8bit_pred.png

Page 31: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

31

VISUALLY INSPECT THE PREDICTION

In the GTC2017-53021/scripts/ directory, run script

scripts/display_color.py

$ cd /home/nvidia/GTC2017-53021/scripts/

$ python display_color.py ../output/aachen_000000_000019_leftImg8bit_pred.png

Page 32: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

32

SUMMARY OF FP32 DEPLOYMENTPerformance comparison against Caffe on DrivePX2 dGPU

CAFFE TENSORRT FP32

Runtime (ms) 242.2 170.7

Images/sec 4.1 5.9

0

5

10

15

20

25

30

Caffe TensorRT FP32

Images/

sec

Caffe TensorRT FP32

Batch Size = 1, Input/Output Resolution = 512 x 1024

Page 33: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

33

INT8 DEPLOYMENT USING TensorRT

Page 34: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

34

TensorRTStep 1: Optimize trained model for INT8

Training FrameworkTensorRTOptimizer

ValidationUSING TensorRT

PLANNEURALNETWORK

developer.nvidia.com/tensorrt

Serialize to disk

Batch Size

Precision

Calibration

Dataset

Validation

Dataset

Page 35: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

35

TensorRTStep 2: Deploy optimized plans with runtime

TensorRTruntimeengine

Serialized PLAN

developer.nvidia.com/tensorrt

Page 36: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

36

OUTLINE – INT8

Prepare calibration dataset for INT8 inferencing

Create TensorRT engine for INT8 with entropy calibrator

Measure performance of inferencing using TensorRT with INT8

Validate the accuracy of INT8 model using Cityscapes validation dataset

What you will implement today

Page 37: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

37

PREPARATION

Before we move to INT8, please perform the following first

Clean up

$ cd /home/nvidia/GTC2017-53021/output

$ rm *.png

Page 38: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

38

PREPARE CALIBRATION DATASET

In the GTC2017-53021/scripts/ directory, run script

scripts/batch_preprocessor.py

$ cd /home/nvidia/GTC2017-53021/scripts/

$ python batch_preprocessor.py calibration

Location of dataset = /home/nvidia/GTC2017-53021-

Data/Cityscapes/leftImg8bit/train/*/*.png

Processing batches for calibration

Total number of images = 2975

NUM_PER_BATCH = 1

NUM_BATCHES = 50

Page 39: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

39

ENTROPY CALIBRATOR

TODO #7 : Please uncomment.

Line 257~261

sampleCityscapes.cpp - TODO #7

// TODO #7: Uncomment the below 4 lines

BatchStream calibrationStream(CAL_BATCH_SIZE, NB_CAL_BATCHES,

"../batches/batch_calibration");

Int8EntropyCalibrator calibrator(calibrationStream, FIRST_CAL_BATCH);

builder->setInt8Mode(true);

builder->setInt8Calibrator(&calibrator);

Page 40: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

40

BUILD & TEST YOUR CODE

Build and run sample_cityscapes again

Measure per layer execution time using profiler

conv1_1 + relu1_1 input reformatter 0 0.168ms

conv1_1 + relu1_1 1.013ms

conv1_2 + relu1_2 4.241ms

pool1 0.700ms

conv2_1 + relu2_1 2.066ms

conv2_2 + relu2_2 3.851ms

...

...

upscore_pool4 0.047ms

score_pool3 + fuse_pool3 0.066ms

upscore8 2.197ms

Time over all layers: 50.237

Page 41: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

41

PREPARE VALIDATION DATASET

In the GTC2017-53021/scripts/ directory, run script

scripts/batch_preprocessor.py

$ cd /home/nvidia/GTC2017-53021/scripts/

$ python batch_preprocessor.py validation

Location of dataset = /home/nvidia/GTC2017-53021-

Data/Cityscapes/leftImg8bit/val/*/*.png

Processing batches for validation

Total number of images = 500

NUM_PER_BATCH = 1

NUM_BATCHES = 500

Page 42: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

42

GENERATE PREDICTION OUTPUT

In the GTC2017-53021/sampleCityscapesInference/ directory, build the sample

Run the sample_cityscapes_inference program for all 500 validation images

sampleCityscapesInference

$ cd /home/nvidia/GTC2017-53021/sampleCityscapesInference/

$ make

$ cd /home/nvidia/GTC2017-53021/bin

$ ./sample_cityscapes_inference validation

Saving output prediction to

../output/frankfrut_000000_000294_leftImg8bit_pred.png

Page 43: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

43

VALIDATE ACCURACY

In the GTC2017-53021/scripts/ directory, run script

scripts/eval_tensorrt_cityscapes.py

$ cd /home/nvidia/GTC2017-53021/scripts/

$ python eval_tensorrt_cityscapes.py

Evaluating 500 pairs of images...

Images processed: 500

classes IoU nIoU

Score Average : 0.481 0.236

--------------------------------

categories IoU nIoU

Score Average : 0.768 0.565

Page 44: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance

44

SUMMARY OF INT8 DEPLOYMENTPerformance and IoU comparison against Caffe on DrivePX2 dGPU

CAFFETENSORRT

FP32TENSORRT

INT8

Runtime (ms) 242.2 170.7 50.2

Images/sec 4.1 5.9 19.9

Class IoU 48.4 48.4 48.1

Category IoU 76.9 76.9 76.8

0

5

10

15

20

25

30

Caffe TensorRT FP32 TensorRT INT8

Images/

sec

Caffe TensorRT FP32 TensorRT INT8

Batch Size = 1, Input/Output Resolution = 512 x 1024

Page 45: 53021 DEPLOYMENT OF SEMANTIC SEGMENTATION NETWORK … · Semantic segmentation for automotive use case Cityscapes dataset Pre-trained sample network –FCN variant Inference performance