nvidia clara train sdk: transfer learning · overview nvidia’s clara train sdk: transfer learning...

95
NVIDIA CLARA TRAIN SDK: TRANSFER LEARNING DU-09302-002 _v2.0 | June 2019 Getting Started Guide

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

NVIDIA CLARA TRAIN SDK:TRANSFER LEARNING

DU-09302-002 _v2.0 | June 2019

Getting Started Guide

Page 2: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | ii

TABLE OF CONTENTS

Chapter 1. Overview............................................................................................ 1Chapter 2.  Installation.......................................................................................... 3

2.1. Running the container................................................................................... 42.2. Downloading the models................................................................................ 4

Chapter 3. Medical Model Archive............................................................................63.1. Commands..................................................................................................73.2. Configuration.............................................................................................123.3. Cloning MMAR............................................................................................ 14

Chapter 4. Bring your own model to transfer learning.................................................154.1. Components for training workflow................................................................... 15

4.1.1. Data pipelines...................................................................................... 154.1.2. Model.................................................................................................154.1.3.  Loss................................................................................................... 154.1.4. Optimizer............................................................................................164.1.5. Metrics............................................................................................... 16

4.2. Structure of training graph............................................................................ 164.3. Model API specification................................................................................ 174.4. Model creation...........................................................................................184.5.  Examples.................................................................................................. 18

4.5.1. Extend the model class........................................................................... 184.5.2. Model creation..................................................................................... 184.5.3.  Implement methods............................................................................... 204.5.4. Optional methods.................................................................................. 204.5.5. Configuration....................................................................................... 20

Chapter 5. Working with classification and segmentation models................................... 225.1. Working with classification models...................................................................22

5.1.1. Prepare the data...................................................................................225.1.1.1. Data format....................................................................................225.1.1.2. Folder structure...............................................................................225.1.1.3. Datalist JSON file.............................................................................23

5.1.2. Training a classification model.................................................................. 235.1.3. Multi-GPU training................................................................................. 245.1.4. Tensorboard visualization.........................................................................245.1.5. Exporting the model to a TensorRT optimized model inference............................245.1.6. Classification model evaluation with ground truth...........................................245.1.7. Classification model inference...................................................................24

5.2. Working with segmentation models.................................................................. 255.2.1. Prepare the data...................................................................................25

5.2.1.1. Using the data converter....................................................................255.2.1.2. Folder structure...............................................................................26

Page 3: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | iii

5.2.1.3. Datalist JSON file.............................................................................275.2.2. Training a segmentation model..................................................................285.2.3. Multi-GPU training................................................................................. 285.2.4. Tensorboard visualization.........................................................................295.2.5. Exporting the model to a TensorRT optimized model inference............................295.2.6. Segmentation model evaluation with ground truth.......................................... 295.2.7. Segmentation model inference.................................................................. 29

Chapter 6. Appendix........................................................................................... 316.1. Segmentation models................................................................................... 316.2. Classification models................................................................................... 436.3. Data transforms and augmentations................................................................. 446.4. Model training and validation configurations.......................................................55

6.4.1. Train configuration.................................................................................556.4.1.1. Segmentation model example.............................................................. 566.4.1.2. Classification model example.............................................................. 63

6.4.2. Validation configuration...........................................................................756.4.2.1. Segmentation model example.............................................................. 756.4.2.2. Converting from the old format........................................................... 80

6.5. Components.............................................................................................. 816.6. Training with multiple GPUs...........................................................................876.7. Improving inference performance.................................................................... 88

6.7.1. Training with dynamic shape.....................................................................886.7.2. Inference models trained with dynamic shape............................................... 896.7.3. Objectivity of trained models................................................................... 90

Page 4: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | iv

Page 5: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 1

Chapter 1.OVERVIEW

NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allowsdevelopers looking into faster implementation of industry specific Deep Learningsolutions to leverage optimized, ready-to-use, pretrained models built in-house byNVIDIA. These pre-trained models accelerate the developer’s deep learning trainingprocess and reduce higher costs associated with large scale data collection, labeling, andtraining models from scratch.

This toolkit offers an end-to-end workflow for accelerating Deep Learning training andinference for Medical Imaging use cases. The models provided are fully trained forMedical Imaging specific reference use cases such as organ and tumor segmentation andclassification.

The following pre-trained models are available to download for specific classificationand segmentation use cases. Complete details and accuracy metrics are available in theAppendix section of this guide.

‣ Brain Tumor segmentation‣ Liver and Tumor segmentation‣ Hippocampus segmentation‣ Lung Tumor segmentation‣ Prostate segmentation‣ Left Atrium segmentation‣ Pancreas and Tumor segmentation‣ Colon Tumor segmentation‣ Hepatic Vessel segmentation‣ Spleen segmentation‣ Chest X-ray classification

Supervised training

Transfer learning uses an algorithm for supervised training to find the best model basedon training and validation datasets.

The training dataset contains pairs of data items that are used for minimizing loss. Thevalidation dataset contains pairs of data items for validation during the training.

Page 6: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Overview

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 2

A single pass through a full dataset is referred to as an epoch. Since a full dataset cannottypically be processed in a single iteration, it is divided into batches of data items. Foreach batch, an optimizer minimizes a loss function and adjusts the weights of the modelaccordingly. Training metrics are collected and logged during this process.

Once all iterations are completed for the epoch, validation is performed if necessary.Validation is performed by running the validation dataset through the current model.Validation metrics are computed, which measure the quality of the current model usingseveral metrics. One important metric is called the stopping metric, which is used todetermine the quality of the model.

Validation is usually configured to be run every N epochs, where N is configurable. Theresult of the validation determines the best model. The algorithm keeps the currentbest key metric, which is initialized to a large negative number. Each time validation isdone, the computed key metric is compared with the current best. If it is better, then thecurrent best is set to the new metric value, and the current model is written to disk in themodel.ckpt file. The model.ckpt represents the best model.

The more validations performed, the more likely you are to find the best model. Findingthe best model by doing validation after each iteration can take a long time, becausevalidation needs to go through the whole validation dataset. In practice, you should dovalidation every few epochs using the num_training_epoch_per_valid parameter.

When the training is complete, the model_final.ckpt is written to disk and used forfine-tuning. This general algorithm is used for all modes of training: train, fine-tune,multi-gpu train, and multi-gpu fine-tune.

Using model.ckpt or model_final.ckpt

model.ckpt is the best model resulting from training. model_final.ckpt is createdwhen the training is finished normally. Model_final is a snapshot of the model at the lastmoment. It is usually not the best model that can be obtained. Both model.ckpt andmodel_final.ckpt can be used for further training or fine-tuning. Here are two typicaluse cases:

1. Continued training: Use the model_final.ckpt as the starting point forfine-tuning if you think the model has not been converged due to improperconfiguration meaning that the number of epochs was not set high enough.

2. Transfer learning: Use the model.ckpt as the starting point for fine-tuning on adifferent dataset, this may be your own dataset, to obtain the model that is best foryour data. This is also called adaptation.

Page 7: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 3

Chapter 2.INSTALLATION

Using the Transfer Learning Toolkit for Medical Imaging requires the following:

Hardware Requirements

Recommended

‣ 1 GPU or more‣ 16 GB GPU memory‣ 8 core CPU‣ 32 GB system RAM‣ 80 GB free disk space

Software Requirements

‣ Ubuntu 16.04 LTS‣ NVIDIA GPU driver v410.xx or above‣ nvidia-docker 2.0 installed, instructions: https://github.com/NVIDIA/nvidia-docker.‣ Cuda runtime/Python package/Tensorflow These are required, but there is no need

to install them. They are included in the docker image.

Access registration

Get an NGC API Key

‣ NVIDIA GPU Cloud account and API key - https://ngc.nvidia.com/

1. Go to NGC and search for Clara Train container in the Catalog tab. Thismessage is displayed: Sign in to access the PULL feature of this repository.

2. Enter your email address and click Next or click Create an Account. 3. Click Sign In. 4. Click the Clara Train SDK tile.

Page 8: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Installation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 4

Download the docker container

‣ Execute docker login nvcr.io from the command line and enter yourusername and password.

‣ Username: $oauthtoken‣ Password: API_KEY

‣ dockerImage=nvcr.io/nvidia/clara-train-sdk:v1.0-py3‣ docker pull ${dockerImage}

2.1. Running the containerOnce downloaded, run the docker using this command:

docker run -it --rm --ipc=host --net=host --runtime=nvidia --mount type=bind,source=/your/dataset/location,target=/workspace/data $dockerImage /bin/bash

If you are on a network that uses a proxy server to connect to the Internet, you canprovide proxy server details when launching the container. docker run --runtime=nvidia -it --rm -e HTTPS_PROXY=https_proxy_server_ip:https_proxy_server_port -e HTTP_PROXY=http_proxy_server_ip:http_proxy_server_port $dockerImage /bin/bash

The docker, by default, starts in the /opt/nvidia folder. To access local directories fromwithin the docker, they have to be mounted in the docker. To mount a directory, use the-v <source_dir>:<mount_dir> option. For more information, see Bind Mounts.Here is an example:docker run --runtime=nvidia -it --rm -v /home/<username>/tlt-experiments:/workspace/tlt-experiments $dockerImage /bin/bash

This mounts the /home/<username>/tlt-experiments directory in your disk to /workspace/tlt-experiments in docker.

2.2. Downloading the models‣ Use this command to pull the docker:

docker pull ${dockerImage}

‣ Use this command to download models from the NGC model registry: ngcregistry model list nvidia/med/*

The -v argument is a mandatory. Please use --list_versions to find the all theversions that are available.

API_KEY=yourAPIkeyMODEL_NAME=segmentation_ct_spleen

Page 9: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Installation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 5

VERSION=1

ngc registry model download-version nvidia/med/segmentation_ct_spleen:1 -d /var/tmp

See Segmentation models and Classification models for more details.

Page 10: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 6

Chapter 3.MEDICAL MODEL ARCHIVE

The MMAR (Medical Model ARchive) defines a standard structure for organizing allartifacts produced during the model development life cycle.

You can experiment with different configurations for the same MMAR. Create a newMMAR by cloning from an existing MMAR by using the cp -r OS command.

MMAR defines the standard structure for storing artifacts (files) needed and producedby the model development workflow (training, validation, inference, etc.). The MMARincludes all the information about the model, and the work space to perform all modeldevelopment tasks.ROOT config config_train.json config_validation.json environment.json commands set_env.sh train.sh train_finetune.sh train_2gpu.sh train_2gpu_finetune.sh infer.sh validate.sh export.sh resources log.config ... docs license.txt Readme.md ... models (all forms of model: checkpoint, frozen graphs, saved model, TRTIS manifest) model.ckpt.meta, model.ckpt.index, model.ckpt.data tensorboard event files model.frn.pb, model.trt.pb, model.trtis.pbtxt

Page 11: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 7

3.1. CommandsThe provided commands perform model development work based on the configurationsin the config folder. The only command you may need to change is the set_env.sh,where you can set the PYTHONPATH to the proper value.

You don’t need to change any other commands for default behavior, but you can andshould study them to understand how they are defined.

train.sh

This command is used to do basic single-gpu training from scratch. When finished, youshould see the following files in the “models” folder:

‣ model.ckpt - the best model obtained‣ model_final.ckpt - the final model when the training is done. It is usually NOT

the best model.‣ Event files - these are tensorboard events that you can view with tensorboard.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_train.json7 ENVIRONMENT_FILE=config/environment.json8 python3 -u -m medical.tlt2.src.apps.train \9 -m $MMAR_ROOT \10 -c $CONFIG_FILE \11 -e $ENVIRONMENT_FILE \12 --set \13 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \14 epochs=1260 \15 learning_rate=0.0001 \16 num_training_epoch_per_valid=20 \17 multi_gpu=false

Line numbers are not part of the command.

ExplanationLine 1: this is a bash scriptLines 2 to 3: resolve and set the absolute directory path for MMAR_ROOT Line 6: set the training config fileLine 7: set the environment file that defines commonly used variables such as DATA_ROOT.Lines 8 to 17: invokes the training program.Lines 9 to 11: set the arguments required by the training programLine 12: the --set directive allows certain training parameters to be overwrittenLine 13: set the DATASET_JSON to use the dataset_0.json in the “config” folder of the MMAR. This overwrites the DATASET_JSON defined in the environment.json file.Lines 14 to 17: overwrite the training variables as defined in config_train.json

Page 12: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 8

train_finetune.sh

This command is used to continue training from previous checkpoint (model.ckpt).Before running this command, you must have a previously generated checkpoint in themodels folder. The output of this command is the same as train.sh.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_train.json7 ENVIRONMENT_FILE=config/environment.json8 python3 -u -m medical.tlt2.src.apps.train \9 -m $MMAR_ROOT \10 -c $CONFIG_FILE \11 -e $ENVIRONMENT_FILE \12 --set \13 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \14 PRETRAIN_WEIGHTS_FILE="" \15 epochs=1000 \16 learning_rate=0.0001 \17 num_training_epoch_per_valid=20 \18 MMAR_CKPT=$MMAR_ROOT/models/model.ckpt \19 multi_gpu=false

Explanation

This command is very similar to train.sh. The only differences are:

Line 14: set PRETRAIN_WEIGHTS_FILE to empty string. When fine-tuning a model, noneed to download the pretrained weights from the web.

Line 18: set the pre-trained model’s checkpoint location

train_2gpu.sh

This command does horovod-based training with 2 GPUs from scratch. The output ofthis command is the same as train.sh. You can use this as an example for multi-gputraining. Please see horovod for tips. In general, the learning rate should be scaled upbased on the number of GPUs.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_train.json7 ENVIRONMENT_FILE=config/environment.json8 mpirun -np 2 -H localhost:2 -bind-to none -map-by slot \9 -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 10 -mca btl ^openib --allow-run-as-root \11 python3 -u -m medical.tlt2.src.apps.train \12 -m $MMAR_ROOT \13 -c $CONFIG_FILE \14 -e $ENVIRONMENT_FILE \15 --set \

Page 13: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 9

16 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \17 epochs=1250 \18 learning_rate=0.0003 \19 num_training_epoch_per_valid=10 \20 multi_gpu=true

Explanation

This file is very similar to train.sh. The differences are:

Lines 8 to 20: Run 2-GPU training with mpirun program, which is a 3rd-party programthat manages cross-process communication.

Lines 8 to 10: set arguments of mpirun for running 2 processes

Line 20: multi_gpu must be set to true

Lines 11 to 20: the training program setup - same as in train.sh. Note that the learningrate is scaled up, as suggested by horovod.

train_2gpu_finetune.sh

This command does horovod-based training with 2 GPUs from previous checkpoint.The output of this command is the same as train.sh.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT is set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_train.json7 ENVIRONMENT_FILE=config/environment.json8 mpirun -np 2 -H localhost:2 -bind-to none -map-by slot \9 -x NCCL_DEBUG=INFO -x LD_LIBRARY_PATH -x PATH -mca pml ob1 10 -mca btl ^openib --allow-run-as-root \11 python3 -u -m medical.tlt2.src.apps.train \12 -m $MMAR_ROOT \13 -c $CONFIG_FILE \14 -e $ENVIRONMENT_FILE \15 --set \16 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \17 PRETRAIN_WEIGHTS_FILE="" \18 MMAR_CKPT=$MMAR_ROOT/models/model.ckpt \19 learning_rate=0.0003 \20 num_training_epoch_per_valid=10 \21 epochs=1000 \22 multi_gpu=true

Explanation

This command is basically a combination of train_finetune.sh and train_2gpu.sh:

Lines 8 to 22: run the training program with mpirun;

Lines 11 to 22: run the training program with parameters for 2 GPU.

export.sh

Page 14: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 10

Export a trained model checkpoint to frozen graphs. Two frozen graphs will begenerated into the models folder.

Training must have been done before you run this command.

‣ model.fzn.pb - the regular frozen graph‣ model.trt.pb - TRT-optimized frozen graph

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 # Data list containing all data5 export CKPT_DIR=$MMAR_ROOT/models6 python3 -u -m medical.tlt2.src.apps.export \7 --model_file_format CKPT \8 --model_file_path $CKPT_DIR \9 --model_name model \10 --input_node_names "NV_MODEL_INPUT" \11 --output_node_names NV_MODEL_OUTPUT \12 --trt_min_seg_size 50

Explanation

Line 5: set the location of the “models” directory. Checkpoint must have been createdthere.

Lines 6 to 12: invoke the export program.

Line 7: set the source model format: it is a checkpoint format (CKPT)

Line 8: set the path to the checkpoint

Line 9: set the base name of the model’s checkpoint file

Lines 10 and 11: set the input and output node names

Line 12: set the minimum segmentation size for TensorRT optimization

infer.sh

Perform inference against the model, based on the configuration ofconfig_validation.json in the config folder. Inference output is saved in the evalfolder.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_validation.json7 ENVIRONMENT_FILE=config/environment.json8 python3 -u -m medical.tlt2.src.apps.evaluate \9 -m $MMAR_ROOT \10 -c $CONFIG_FILE \11 -e $ENVIRONMENT_FILE \

Page 15: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 11

12 --set \13 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \14 output_infer_result=true \15 do_validation=false

Explanation

Line 1: this is a bash script

Lines 2 to 3: resolve and set the absolute directory path for MMAR_ROOT

Line 6: set the validation config file

Line 7: set the environment file that defines commonly used variables such asDATA_ROOT.

Lines 8 to 15: invokes the evaluate program.

Lines 9 to 11: set the arguments required by the program

Line 12: the --set directive allows certain parameters to be overwritten

Line 13: set the DATASET_JSON to use the dataset_0.json in the “config” folder of theMMAR. This overwrites the DATASET_JSON defined in the environment.json file.

Lines 14 to 15: overwrite default values of the evaluation variables

Line 14: instructs the program to generate inference results

Line 15: instructs the program not to do validation

validate.sh

Perform validation against the model, based on the configuration ofconfig_validation.json in the config folder. Validation output is saved in theevalfolder.

Example

1 #!/usr/bin/env bash2 my_dir="$(dirname "$0")"3 . $my_dir/set_env.sh4 echo "MMAR_ROOT set to $MMAR_ROOT"5 # Data list containing all data6 CONFIG_FILE=config/config_validation.json7 ENVIRONMENT_FILE=config/environment.json8 python3 -u -m medical.tlt2.src.apps.evaluate \9 -m $MMAR_ROOT \10 -c $CONFIG_FILE \11 -e $ENVIRONMENT_FILE \12 --set \13 DATASET_JSON=$MMAR_ROOT/config/dataset_0.json \14 do_validation=true \15 output_infer_result=false

Explanation

This command is very similar to infer.sh. The only differences are:

Line 14: instructs the program to do validation

Page 16: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 12

Line 15: instructs the program not to generate inference results

infer.sh and validate.sh use the same evaluate program.

3.2. ConfigurationThe JSON files in the config folder define configurations of workflow tasks (training,inference, and validation).

config_train.json

This file defines components that make up the training workflow. It is used by allfour training commands (single-gpu training and finetuning, multi-gpu training andfinetuning).

config_validation.json

This file defines configuration that is used for both validate.sh and infer.sh. Theonly difference between the two commands are the options of do_validation andoutput_infer_result.

environment.json

This file defines the common parameters for all model work. The most important areDATA_ROOT and DATASET_JSON.

‣ DATA_ROOT specifies the directory that contains the training data.‣ DATASET_JSON specifies the config file that contains the default training data split

(usually dataset_0.json).

Since MMAR does not contain training data, you must ensure that these twoparameters are set to the right value. Do not change any other parameters.

Example

{ "DATA_ROOT": "/workspace/data/Task09_Spleen_nii", "DATASET_JSON": "/workspace/data/Task09_Spleen_nii/dataset_0.json", "PROCESSING_TASK": "segmentation", "MMAR_EVAL_OUTPUT_PATH": "eval", "MMAR_CKPT_DIR": "models", "PRETRAIN_WEIGHTS_FILE": "/var/tmp/resnet50_weights_tf_dim_ordering_tf_kernels.h5"}

Variable Description

DATA_ROOT The location of training data

DATASET_JSON The data split config file

Page 17: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 13

Variable Description

PROCESSING_TASK The task type of the training: segmentationor classification

MMAR_EVAL_OUTPUT_PATH Directory for saving evaluation (validateor infer) results. Always the “eval” folderin the MMAR

MMAR_CKPT_DIR Directory for saving training results.Always the “models” folder in the MMAR

PRETRAIN_WEIGHTS_FILE Location of the pre-trained weightsfile. NOTE: if the file does not exist andis needed, the training program willdownload it from predefined URL fromthe web.

Data Split Config file

This is a JSON file that defines data split used for training and validation. Forclassification model, this file is usually named “plco.json”; for other models, it is usuallynamed “dataset_0.json”.

The following is dataset_0.json of the model segmentation_ct_spleen:

{ "description": "Spleen Segmentation", "labels": { "0": "background", "1": "spleen" }, "licence": "CC-BY-SA 4.0", "modality": { "0": "CT" }, "name": "Spleen", "numTest": 20, "numTraining": 41, "reference": "Memorial Sloan Kettering Cancer Center", "release": "1.0 06/08/2018", "tensorImageSize": "3D", "training": [ { "image": "imagesTr/spleen_29.nii.gz", "label": "labelsTr/spleen_29.nii.gz" },… <<more data here>>….. { "image": "imagesTr/spleen_49.nii.gz", "label": "labelsTr/spleen_49.nii.gz" } ], "validation": [ { "image": "imagesTr/spleen_19.nii.gz", "label": "labelsTr/spleen_19.nii.gz" },

… <<more data here>>…..

Page 18: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Medical Model Archive

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 14

{ "image": "imagesTr/spleen_9.nii.gz", "label": "labelsTr/spleen_9.nii.gz" } ]}

There is a lot of information in this file, but the only sections needed by the training andvalidation programs are the “training” and “validation” sections, which define sample/label pairs of data items for training and validation respectively.

3.3. Cloning MMARA MMAR is a self-contained workspace for model development work. If you wantto experiment with different configurations for the same MMAR, you should createa new MMAR by cloning from an existing MMAR. We will provide a “mmar-clone”command in the future, but before that you can easily use the “cp -r” OS command to doit yourself.

Page 19: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 15

Chapter 4.BRING YOUR OWN MODEL TO TRANSFERLEARNING

You can use the predefined models offered by NVIDIA, or choose to use your ownmodel architecture when configuring a training workflow, provided your model followsour model development guidelines.

4.1. Components for training workflowA training workflow typically requires the following common components:

4.1.1. Data pipelinesA data pipeline contains a chain of transforms that are applied to the input image andlabel data to produce the data in the format required by the model. This release providespredefined transforms that you can use to configure the transformation chains.

Data pipelines produce batched data items during training. Typically, two data pipelinesare used: one for producing training data, another producing validation data.

4.1.2. ModelThe model component implements the neural network. It produces prediction for aninput.

4.1.3. LossThe loss component implements a loss function, typically based on the prediction fromthe model and corresponding label data.

Page 20: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 16

4.1.4. OptimizerThe optimizer component implements training optimization algorithm for findingminimal loss during training.

4.1.5. MetricsThese components are used to dynamically measure the quality of the model duringtraining on different aspects. Metric values are computed based on values of tensors.There are two kinds of metric components: training metrics, and validation metrics.

‣ A training metric is a graph-building component that adds computationaloperations to the training graph, which produce tensors for metric computation.

‣ Validation metrics implement algorithms to compute values for different aspects ofthe model, based on the values of tensors in the graph.

4.2. Structure of training graphThis diagram shows the overall structure of the training graph. It shows how thecomponents are related. The blue ovals represent placeholders.

These components are built in this order:

1. Training Data Pipeline 2. Validation Data Pipeline

Page 21: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 17

3. Placeholders 4. Model 5. Loss 6. Optimizer 7. Metrics

4.3. Model API specificationThe model must conform to the API spec.

from abc import abstractmethod, ABCimport tensorflow as tf

class Model(ABC): @abstractmethod def get_predictions(self, inputs, training, build_ctx=None): pass def get_loss(self): return 0 def get_update_ops(self): return tf.get_collection(tf.GraphKeys.UPDATE_OPS)

Your model must extend the class Model and implements the required abstract methods.

get_predictions method

This method is required and is called during the construction of the computation graph.It must return a prediction tensor, as shown in the diagram above.

The inputs argument is the model input placeholder of the model.

The build_ctx argument is a dict that holds the data objects that are already built(see the component building order above). You can use them in the construction of yourmodel. Specifically, by the time the get_predictions method is called, data pipelinesand placeholders are already built, and the build_ctx contains the following objects:

‣ data_property – properties about the input data such as data format(channels_first, channels_last), number of image channels, number of label channels,etc.

‣ model_input – the placeholder for model input‣ label_input – the placeholder for label input‣ learning_rate– the placeholder for learning rate‣ is_train – the placeholder for is training flag

get_loss method

The get_loss method is called during the construction of the computation graph.You can override the default implementation of this method (which returns 0) if youwant to return a model-specific loss. This loss is added to the result of the regular losscomponent.

get_update_ops method

Page 22: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 18

You can also provide model-specific update ops using this method. The update ops willbe used as the dependency for the Optimizer’s minimize operation.

4.4. Model creationTransfer Learning manages components with a create and use strategy. Components arefirst configured and created based on the configuration parameters.

The configuration parameters are passed to the component’s construction method,__init__ , to get the component created. Since the parameters are defined atconfiguration time, they can only be simple static values (vs. dynamically createdvalues such as tensors). Once the components are all created, workflow engine will startthe graph construction process, which will invoke each component’s graph-buildingmethods.

When creating your own model, you must follow this strategy: the __init__ methodof the model class must only expect configuration parameters.

4.5. Examples

4.5.1. Extend the model classTo extend the model class, first , define your model as a subclass of the Model class:

import tensorflow as tffrom medical.tlt2.src.components.models.model import Modelclass Model:

4.5.2. Model creationThe model’s constructor must only accept configurable parameters. Keep them ininstance variables.import tensorflow as tffrom medical.tlt2.src.components.models.model import Model

class CustomNetwork(Model):

def __init__(self, num_classes, factor=32, training=False, data_format='channels_first', final_activation='linear'): Model.__init__(self) self.model = None self.num_classes = num_classes self.factor = factor self.training = training self.data_format = data_format self.final_activation = final_activation

if data_format == 'channels_first':

Page 23: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 19

self.channel_axis = 1 elif data_format == 'channels_last': self.channel_axis = -1

def network(self, inputs, training, num_classes, factor, data_format, channel_axis): # very shallow Unet Network with tf.variable_scope('CustomNetwork'):

conv1_1 = tf.keras.layers.Conv3D(factor, 3, padding='same', data_format=data_format, activation='relu')(inputs) conv1_2 = tf.keras.layers.Conv3D(factor * 2, 3, padding='same', data_format=data_format, activation='relu')(conv1_1) pool1 = tf.keras.layers.MaxPool3D(pool_size=(2, 2, 2), strides=2, data_format=data_format)(conv1_2)

conv2_1 = tf.keras.layers.Conv3D(factor * 2, 3, padding='same', data_format=data_format, activation='relu')(pool1) conv2_2 = tf.keras.layers.Conv3D(factor * 4, 3, padding='same', data_format=data_format, activation='relu')(conv2_1)

unpool1 = tf.keras.layers.UpSampling3D(size=(2, 2, 2), data_format=data_format)(conv2_2) unpool1 = tf.keras.layers.Concatenate(axis=channel_axis)([unpool1, conv1_2])

conv7_1 = tf.keras.layers.Conv3D(factor * 2, 3, padding='same', data_format=data_format, activation='relu')(unpool1) conv7_2 = tf.keras.layers.Conv3D(factor * 2, 3, padding='same', data_format=data_format, activation='relu')(conv7_1)

output = tf.keras.layers.Conv3D(num_classes, 1, padding='same', data_format=data_format)(conv7_2)

if str.lower(self.final_activation) == 'softmax': output = tf.nn.softmax(output, axis=channel_axis, name='softmax') elif str.lower(self.final_activation) == 'sigmoid': output = tf.nn.sigmoid(output, name='sigmoid') elif str.lower(self.final_activation) == 'linear': pass else: raise ValueError( 'Unsupported final_activation, it must of one (softmax, sigmoid or linear), but provided:' + self.final_activation)

return output

# additional custom loss def loss(self): return 0

def get_predictions(self, inputs, training, build_ctx=None): self.model = self.network( inputs=inputs, training=training, num_classes=self.num_classes, factor=self.factor, data_format=self.data_format, channel_axis=self.channel_axis ) return self.model

def get_loss(self): return self.loss()

Page 24: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 20

4.5.3. Implement methodsDefine the get_predictions method.

def get_predictions(self, inputs, training, build_ctx=None): if self.nn_data_format == 'NCDHW': if self.plane == 'x': inputs = tf.transpose(self.inputs, perm=[0, 1, 4, 3, 2]) elif self.plane == 'y': inputs = tf.transpose(self.inputs, perm=[0, 1, 2, 4, 3]) elif self.plane == 'z': inputs = self.inputs else: print('Incorrect key value for plane!') elif self.nn_data_format == 'NDHWC':... return res_final

4.5.4. Optional methodsOptionally, you can define the get_loss method and the get_update_ops method forthe model.

4.5.5. Configuration

Once your model is developed following the guidelines, you can use it in the trainingworkflow with the following steps:

1. Locate the section for model in the training config JSON file. 2. Specify the path to your model’s class. 3. Specify all required init parameters in the args section. 4. Make sure that the specified model class path is in PYTHONPATH.

Here is sample training config file:

{ "epochs": 1240, "num_training_epoch_per_valid": 20, "learning_rate": 1e-4, "multi_gpu": false, "train": { "loss": { "name": "Dice" },

"optimizer": { "name": "Adam" }, …...

"model": { "path": "yourFileName.CustomNetwork",

Page 25: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Bring your own model to transfer learning

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 21

"args": { "num_classes": 2, "factor": 8, "final_activation": "softmax" } }, …...}

The pythonPathToYourModelClass must be accessible through PYTHONPATH.

For example, if pythonPathToYourModelClass is defined as: foo.bar.FancyNet andthe class FancyNet is implemented in:

/project/deeplearn/foo/bar.py

then, PYTHONPATH must include

/project/deeplearn

Page 26: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 22

Chapter 5.WORKING WITH CLASSIFICATION ANDSEGMENTATION MODELS

The chapter provides instructions on preparing your data, training models, exporting,evaluating, and performing inference on the trained classification and segmentationmodels with transfer learning.

5.1. Working with classification models

5.1.1. Prepare the dataThis section describes the format in which the data can be used with transfer learning for2D classification tasks.

5.1.1.1. Data format

All input images and labels must be in png format. If you are planning to resampleimages, e.g., to 256x256, it is best to do that as a pre-processing step, rather than havethe TLT toolkit do that on the fly. The png files can be 8- or 16-bit. You must also haveground truth labels available. These are often binary, i.e., {0,1}, or multi-class, i.e., {0,…,C} if there are C classes.

5.1.1.2. Folder structure

The layout of data files can be arbitrary, but the JSON file describing the data list mustcontain relative paths to all image files.|--dataset_root: |--datalist.json |--png_files |--im1.png |--im2.png |--im3.png

Page 27: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 23

5.1.1.3. Datalist JSON file

The JSON file describing the data structure must include a label_format key. Thecorresponding value should be a list of natural numbers, specifying the number of typeof labels in the dataset. For instance, for the PLCO dataset, there are 15 binary labels, soit should be a list of 15 ones: [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1].

The datafile should also have a training and validation key. These keys contain:

‣ a list of dictionaries, where the value for the image key must be a relative path to thepng file.

‣ the value for the label key must be a list of natural numbers corresponding to theground truth labels.

The labels for each image must match the label_format specified above. { "label_format": [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1], "training": [ { "image" : "im1.png" "label" : [0,0,1,0,0,0,0,0,0,0,1,0,0,0,0] },...

The validation key is optional and only needs to be specified if the main trainingconfig file specifies metrics to compute. If the validation key is provided, it specifiesthe corresponding images and labels used to compute the validation metrics at the endof each training epoch (or less/more frequent if specified in the main training config).

5.1.2. Training a classification modelRun train.sh to train the model.

cd path/to/mmar/commands/folder

./train.sh

To fine-tune based on the pre-trained model included in the MMAR, first change theDATA_ROOT and DATASET_JSON to point to your dataset and data split configuration.Then run the train_finetune.sh:

cd path/to/mmar/commands/folder

./train_finetune.sh

The resulting checkpoint files are stored in the models folder of the MMAR.

For more details about MMAR, see Medical Model Archive..

For detailed example of training config of classification model, see Model training andvalidation configurations.

Page 28: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 24

5.1.3. Multi-GPU trainingTo run multi-gpu training, runtrain_2gpu.sh. See Medical Model Archive.

When training or finetuning the models in multi-GPU setting on small numberof training data, it is recommended to adjust the learning rate provided in theconfiguration files, e.g. multiple the learning rate by the GPU number as isrecommended in https://arxiv.org/pdf/1706.02677.pdf.

5.1.4. Tensorboard visualizationYou can run the following command to use Tensorboard for visualization.python3 -m tensorboard.main --logdir "${MODEL_DIR}"

5.1.5. Exporting the model to a TensorRT optimizedmodel inferenceAfter the model has been trained, run export.sh from the "commands" folder inMMAR to export the checkpoint into frozen graphs.

cd path/to/mmar/commands/folder

./export.sh

Two frozen graph files will be produced in the models folder of the MMAR:

‣ model.fzn.pb - a regular frozen graph‣ model.trt.pb - TRT-optimized frozen graph

5.1.6. Classification model evaluation with ground truthRun validate.sh from the MMAR.

cd path/to/mmar/commands/folder

./validate.sh

The validation result files are created in the eval folder of theMMAR.

See Model training and validation configurations for example of validation config forclassification model.

5.1.7. Classification model inferenceRun infer.sh from the MMAR.

cd path/to/mmar/commands/folder

./infer.sh

Page 29: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 25

The inference result files are created in the eval folder of theMMAR.

Use the same configuration file for both validation and inference. For inference, themetric values specified in the configuration file won't be computed, and no groundtruth label is needed.

5.2. Working with segmentation modelsThis section provides instructions on preparing your data, training models, exporting,evaluating and performing inference on the trained segmentation models using transferlearning.

5.2.1. Prepare the dataAll input images and labels must be in NIfTI format. Each input image and itscorresponding label mask must have the same image dimension. To visualize or saveNIfTI images, you can use free viewers such as ITK-SNAP or MITK.

If your native data format is different from NIfTI or if you want to convert the imageand label mask to isotropic resolution, you can use the provided Data Converter or someother software of your choice, such as ITK-SNAP or directly in Python.

5.2.1.1. Using the data converter

If the data format is DICOM or the resolution is not isotropic, one can use the provideddata converter tool to convert the data to isotropic NIfTI format. Furthermore, manypre-trained models were trained on 1x1x1mm resolution images, and to use those pre-trained models as a starting point, convert the data to 1x1x1mm NIfTI format (Notice:If the dataset is already in NIfTI format, but not with 1x1x1 mm spacing, the dataconversion is still required for the dataset.).

The tlt-dataconvert command converts all dicom volumes in your/data/directory toNIfTI format and optionally re-samples them to the provided resolution. If the images tobe converted are segmentation labels, an option -l needs to be added, and the resamplerwill use nearest neighbor interpolator (otherwise linear interpolator is used).tlt-dataconvert -d your/data/directory -r 1 -s .dcm -e .nii.gz -o your/output/directory

Supported options are:

Option Description

-d Input directory with subdirectoriescontaining dicom images.

-r Output image resolution. If not provided,dicom resolution will be preserved. Ifonly a single value is provided, target

Page 30: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 26

Option Description

resolution will be isotrophic (e.g. -r 1 for1x1x1mm resolution)

-s Input file format,

can be .dcm, .nii, .nii.gz, .mha, .mhd.

-e Output file format, canbe .nii, .nii.gz, .mha, .mhd.

-o Output directory.

-f (Optional) Force overwriting exsisting filesif output directory already exists.

-l (Optional) Flag indicating that the data isLABEL/SEGMENTATION masks and thenearest neighbor interpolation is used forre-sampling.

If you need to convert both 3D volumetric images and their segmentation labels, putthem into two different folders, and run the converter once for the images and oncefor the labels using the -l flag.

5.2.1.2. Folder structure

The layout of data files can be arbitrary, but the JSON file describing the data list mustcontain the relative paths to all data files.

|--dataset_root: |--datalist.json |--train |--im1.nii.gz |--lb1.nii.gz |--im2.nii.gz |--lb2.nii.gz |--im3.nii.gz |--lb3.nii.gz |--im4.nii.gz |--lb4.nii.gz |--val |--im1.nii.gz |--lb1.nii.gz |--im2.nii.gz |--lb2.nii.gz

For example, the datalist.json file looks similar to this. Here all paths are relative todatalist.json location.

{ "training": [ { "image" : "train/im1.nii.gz", "label" : "train/lb1.nii.gz" }, {

Page 31: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 27

"image" : "train/im2.nii.gz", "label" : "train/lb2.nii.gz" }, { "image" : "train/im3.nii.gz", "label" : "train/lb3.nii.gz" }, { "image" : "train/im4.nii.gz", "label" : "train/lb4.nii.gz" }, ], "validation": [ { "image" : "val/im1.nii.gz", "label" : "val/lb1.nii.gz" }, { "image" : "val/im2.nii.gz", "label" : "val/lb2.nii.gz" }, ]}

The training and validation lists contain the images to be used in the training andvalidation steps, respectively.

By default, all paths inside the datalist.json are assumed relative to the datalist.jsonfile location. You can optionally specify the ROOT base path of the datasets byspecifying it in the main config file (image_base_dir JSON key) or as a command lineoption (--file_root) to tlt-train command.

5.2.1.3. Datalist JSON file

The JSON file describing the data structure must include the training key with a list ofitems (each containing image and label keys).

The value for the image key can be a string containing the path to a single NIfTI file or alist of strings that are paths to NIfTI files. If there are several channels they are saved asseparate files. Here is an example: { "image" : [ "train/im1_ch1.nii.gz", "train/im1_ch2.nii.gz", "train/im1_ch3.nii.gz", "train/im1_ch4.nii.gz" ] "label" : "train/lb1.nii.gz" },

If image includes several files, they will be concatenated as separate channels of thenetwork input. These images must be already spatially aligned.

The value for the label key, must be a string containing the path to a single NIfTI filewith dense segmentation masks. The label mask defines segmentation using indices.Each integer index is a separate class or a multichannel one-hot-encoded image, whereeach channel represents a separate class.

Page 32: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 28

The validation key is optional. If provided, the corresponding images/labels will beused to compute the validation metrics at the end of each specified training epoch inthis release or less/more frequent if specified in the main training config. The validationsection does not need to include the label keys, if the datalist.json is used forinference to compute the output segmentation masks.

5.2.2. Training a segmentation modelSegmentation training

Use train.sh to train the model:

cd path/to/mmar/commands/folder

./train.sh

Fine Tuning

To fine-tune based on the pre-trained model included in the MMAR, first change theDATA_ROOT and DATASET_JSON to point to your dataset and data split configuration.Then run the train_finetune.sh:

cd path/to/mmar/commands/folder

./train_finetune.sh

The resultant checkpoint files are stored in the models folder of the MMAR.

For more details see the Medical Model Archive.

For detailed example of training config of segmentation model, see Model training andvalidation configurations.

5.2.3. Multi-GPU trainingTo run 2-gpu training, runtrain_2gpu.sh.

cd path/to/mmar/commands/folder

./train_2gpu.sh

To fine-tune based on the pre-trained model included in the MMAR, first change theDATA_ROOT and DATASET_JSON to point to your dataset and data split configuration.Then run the train_2gpu_finetune.sh:

cd path/to/mmar/commands/folder

./train_2gpu_finetune.sh

The resulting checkpoint files are stored in the models folder of the MMAR.

See Medical Model Archive for more details.

When training or fine-tuning the models using the multi-GPU setting on a relativelysmall training dataset, it is recommended to adjust the learning rate provided inthe configuration files, e.g. multiply the learning rate by the number of GPUs as

Page 33: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 29

is recommended in https://arxiv.org/pdf/1706.02677.pdf. You can create your owntrain_Ngpu.sh based on train_2gpu.sh. Make sure to adjust the learning rateaccordingly.

5.2.4. Tensorboard visualizationYou can run the following command to use Tensorboard for visualization.python3 -m tensorboard.main --logdir "${MODEL_DIR}"

5.2.5. Exporting the model to a TensorRT optimizedmodel inferenceAfter the model has been trained, run export.sh from the "commands" folder inMMAR to export the checkpoint into frozen graphs.

cd path/to/mmar/commands/folder

./export.sh

Two frozen graph files will be produced in the models folder of the MMAR:

‣ model.fzn.pb - a regular frozen graph‣ model.trt.pb - TRT-optimized frozen graph

5.2.6. Segmentation model evaluation with ground truthRun the validate.sh from the MMAR.

cd path/to/mmar/commands/folder

./validate.sh

The validation result files are created in the eval folder of the MMAR.

See Model training and validation configurations for example of validation config forclassification model.

5.2.7. Segmentation model inferenceUse infer.sh to run inference on the model from the Medical Model Archive.

cd path/to/mmar/commands/folder

./infer.sh

The inference result files are created in the eval folder of the MMAR.

Page 34: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Working with classification and segmentation models

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 30

See Model training and validation configurations for example of validation config forclassification model.

We use the same configuration file for both validation and inference. For inference,the metric values specified in the configuration file won't be computed, and noground truth label is needed.

Page 35: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 31

Chapter 6.APPENDIX

6.1. Segmentation modelsHere is a list of the segmentation models. All the models are trained using 1x1x1mmresolution data.

Segmentation model Description

Brain tumor segmentation

‣ segmentation_mri_brain_tumors

_br16_full

A pre-trained model for volumetric (3D)segmentation of brain tumors from multi-modal MRIs based on BraTS 2018 data.

https://www.med.upenn.edu/sbia/brats2018/data.html

The model is trained to segment 3 nestedsubregions of primary (gliomas) braintumors: the "enhancing tumor" (ET),the "tumor core" (TC), and the "wholetumor" (WT), based on 4 input MRI scans( T1c, T1, T2, FLAIR). The ET is describedby areas that show hyper-intensity inT1c when compared to T1, but also whencompared to "healthy" white matter in T1c.The TC describes the bulk of the tumor,which is what is typically resected. The TCencompasses the ET, as well as the necrotic(fluid-filled) and the non-enhancing (solid)parts of the tumor. The WT describes thecomplete extent of the disease, as it entailsthe TC and the peritumoral edema (ED),which is typically depicted by hyper-intense signal in FLAIR.

Page 36: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 32

Segmentation model Description

The dataset is available at "MultimodalBrain Tumor Segmentation Challenge(BraTS) 2018." The provided labelled datawas partitioned, based our own split, intotraining (243 studies) and validation (42studies) datasets.

For more detailed description of tumorregions, please see the MultimodalBrain Tumor Segmentation Challenge(BraTS) 2018 data page at: https://www.med.upenn.edu/sbia/brats2018/data.html.

This model utilized a similar approachdescribed in 3D MRI brain tumorsegmentation using autoencoderregularization, which was a winningmethod in BraTS2018 [1].

The provided training configurationrequired 16GB GPU memory.

Model Input Shape: 224 x 224 x 128

Training Script: train.sh

Model input and output:

‣ Input: 4 channel 3D MRIs (T1c, T1, T2,FLAIR)

‣ Output: 3 channels of tumorsubregion 3D masks

The model was trained with 285 cases withour own split, as shown in the datalist jsonfile in the config folder.

‣ Tumor core (TC): 0.8624‣ Whole tumor (WT): 0.9020‣ Enhancing tumor (ET): 0.7770

‣ segmentation_mri_brain_tumors

_br16_t1c2tc

A pre-trained model for volumetric (3D)brain tumor segmentation (only TC fromT1c images). The model is trained tosegment "tumor core" (TC) based on 1input MRI scan (T1c).

The dataset is available at "MultimodalBrain Tumor Segmentation Challenge(BraTS) 2018." The provided labelled datawas partitioned, based our own split,

Page 37: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 33

Segmentation model Description

into training (243 studies) and validation(42 studies) datasets, as shown in config/seg_brats18_datalist_t1c.json.

For more detailed description of tumorregions, please see the Multimodal BrainTumor Segmentation Challenge (BraTS)2018 data page at:

https://www.med.upenn.edu/sbia/brats2018/data.html

This model utilized a similar approachdescribed in 3D MRI brain tumorsegmentation using autoencoderregularization, which was a winningmethod in BraTS2018 [1].

The provided training configurationrequired 16GB GPU memory.

Model Input Shape: 224 x 224 x 128

Training Script: train.sh

Model input and output:

‣ Input: 1 channel 3D MRI (T1c)‣ Output: 1 channel of tumor core 3D

masks

The achieved mean Dice score on thevalidation data is: Tumor core (TC): 0.839

Liver and Tumor segmentation

‣ segmentation_ct_liver_and_tumor A pre-trained model for volumetric (3D)segmentation of the liver and lesion inportal venous phase CT image.

This model is trained using the runnerup[2] awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"using the AHnet architecture [3].

This model was trained with Liverdataset, as part of "Medical SegmentationDecathlon Challenge 2018". It consists of131 labelled data and 70 unlabelled data.The labelled data was partitioned, basedon our own split, into 104 training imagesand 27 validation images for this trainingtask, as shown in config/dataset_0.json.

Page 38: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 34

Segmentation model Description

For more detailed description of "MedicalSegmentation Decathlon Challenge 2018,"at:

http://medicaldecathlon.com/.

The training dataset is Task03_Liver.tarfrom the link above.

The data must be converted to 1mmresolution before training:tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii -o ${DESTINATION_IMAGE_ROOT}

To match the defaultsetting, we suggest that${DESTINATION_IMAGE_ROOT}match DATA_ROOT as defined inenvironment.json in this MMAR'sconfig folder.

The provided training configurationrequired 12GB GPU memory.

Data Conversion: convert to resolution1mm x 1mm x 1mm

Model input shape: dynamic

Training Script: train.sh

Model input and output:

‣ Input: 1 channel CT image‣ Output: 3 channels:

‣ Label 1: liver‣ Label 2: tumor‣ Label 0: everything else

This Dice scores on the validation dataachieved by this model are:

‣ Liver: 0.932‣ Tumor: 0.495

Hippocampus segmentation

‣ segmentation_mri_hippocampus A pre-trained model for volumetric (3D)segmentation of the hippocampus headand body from mono-modal MRI image.

Page 39: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 35

Segmentation model Description

This model is trained using the runner-up awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"with 208 training images and 52 validationimages.

Training Data Source:Task04_Hippocampus.tar from http://medicaldecathlon.com/

The data was converted to resolution1mm x 1mm x 1mm for training, using thefollowing command:tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

The training was performed withcommand train.sh, which required 12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 96

Model input and output:

‣ Input: 1 channel MRI image‣ Output: 2 channels:

‣ Label 1: hippocampus‣ Label 0: everything else

This model achieves the following Dicescore on the validation data (our own splitfrom the training dataset):

‣ Hippocampus: 0.872 (mean_dice1:0.882 mean_dice_dice2: 0.862

Lung Tumor segmentation

‣ segmentation_ct_lung_tumor A pre-trained model for volumetric(3D) segmentation of the lung tumorfrom CT image. This model is trainedusing the runner-up awarded pipelineof the "Medical Segmentation DecathlonChallenge 2018" with 50 training imagesand 13 validation images.

Training Data Source:

Page 40: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 36

Segmentation model Description

Task06_Lung.tar from http://medicaldecathlon.com/

The data was converted to resolution1mm x 1mm x 1mm for training, using thefollowing command:tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

The training was performed withcommand train_2gpu.sh, whichrequired 12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 96

Model input and output:

‣ Input: 1 channel CT image‣ Output: 2 channels:

‣ Label 1: lung tumor‣ Label 0: everything else

This Dice scores on the validation data(our own split) achieved by this model are:

‣ lung: 0.417

Prostrate segmentation

‣ segmentation_mri_prostate_cg_and_pz A pre-trained model for volumetric (3D)segmentation of the prostate central glandand peripheral zone from the multimodalMR (T2, ADC). This model is trainedusing the runner-up awarded pipelineof the "Medical Segmentation DecathlonChallenge 2018" with 25 training imagepairs and 7 validation images.

Training Data Source: Task05_Prostate.tarfrom http://medicaldecathlon.com/. Thedata was converted to resolution 1mmx 1mm x 1mm for training, using thefollowing command: tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

Page 41: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 37

Segmentation model Description

The training was performed withcommand train_4gpu.sh, which required12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 32

Model input and output:

‣ Input: 2 channel MRI image‣ Output: 2 channels:

‣ Label 1: prostate peripheral zone‣ Label 0: everything else

This model achieve the following Dicescore on the validation data (our own splitfrom the training dataset):

‣ Prostate: 0.724 (mean_dice1: 0.485mean_dice2: 0.871)

Left atrium segmentation

‣ segmentation_mri_left_atrium A pre-trained model for volumetric (3D)segmentation of the left atrium from MRIimage.

This model is trained using the runner-up awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"with 16 training images and 4 validationimages.

Training Data Source: Task02_Heart.tarfrom http://medicaldecathlon.com/ Thedata was converted to resolution 1mmx 1mm x 1mm for training, using thefollowing command:

tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

The training was performed withcommand train_2gpu.sh, which required12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 96

Model input and output:

Page 42: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 38

Segmentation model Description

‣ Input: 1 channel MRI image‣ Output: 2 channels:

‣ Label 1: heart‣ Label 0: everything else

This Dice scores on the validation data(our own split) achieved by this model are:

1. heart: 0.9158

Pancreas and tumor segmentation

‣ segmentation_ct_pancreas_and_tumor A pre-trained model for volumetric (3D)segmentation of the pancreas and tumorfrom portal venous phase CT.

This model is trained using the runnerup[2] awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"using the AHnet architecture [3].

This model is trained with Pancreasdataset, as part of "Medical SegmentationDecathlon Challenge 2018". It consists of281 labelled data and 139 non-labelleddata. The labelled data was partitioned,based on our own split, into 224 trainingimages and 57 validation images forthis training task, as shown in config/dataset_0.json.

For more detailed description of "MedicalSegmentation Decathlon Challenge 2018,"see http://medicaldecathlon.com/.

The training dataset is Task07_Pancreas.tarfrom the link above.

The data must be converted to 1mmresolution before training:tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii -o ${DESTINATION_IMAGE_ROOT}

To match up with the defaultsetting, we suggest that${DESTINATION_IMAGE_ROOT}match DATA_ROOT as defined inenvironment.json in this MMAR'sconfig folder.

Page 43: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 39

Segmentation model Description

The provided training configurationrequired 12GB GPU memory.

Data Conversion: convert to resolution1mm x 1mm x 1mm

Model Input Shape: dynamic

Training Script: train.sh

The training dataset is Task07_Pancreas.tarfrom the link above. The data must beconverted to 1mm resolution beforetraining:

Data Conversion: convert to resolution1mm x 1mm x 1mm

Model input shape: dynamic

Model input and output:

‣ Input: 1 channel CT image‣ Output: 3 channels:

‣ Label 1: pancreas‣ Label 2: tumor‣ Label 0: everything else

This model achieves the following Dicescore on the validation data (our own splitfrom the training dataset):

‣ Pancreas: 0.739‣ Tumor: 0.348

Colon tumor segmentation

‣ segmentation_ct_colon_tumor A pre-trained model for volumetric (3D)segmentation of the Colon from CT image.

This model is trained using the runner-up awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"with 100 training images and 26 validationimages.

Training Data Source: Task10_Colon.tarfrom http://medicaldecathlon.com/ Thedata was converted to resolution 1mmx 1mm x 1mm for training, using thefollowing command.

Page 44: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 40

Segmentation model Descriptiontlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

The training was performed withcommand train_2gpu.sh, which required12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 96

Training Graph Input Shape: dynamic

Model input and output:

‣ Input: 1 channel CT image‣ Output: 2 channels:

‣ Label 1: colon tumor‣ Label 0: everything else

This Dice scores on the validation data(our own split) achieved by this model are:

‣ colon cancer: 0.367

Hepatic vessel and tumor segmentation

‣ segmentation_ct_hepatic_vessel

_and_tumor

A pre-trained model for volumetric (3D)segmentation of the hepatic vessel andtumor from CT image.

This model is trained using the runnerup[2] awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"using the AHnet architecture [3].

This model was trained with HepaticVessel dataset, as part of "MedicalSegmentation Decathlon Challenge 2018".It consists of 303 labelled data and 140non-labelled data. The labelled data waspartitioned, based on our own split, into242 training images and 61 validationimages for this training task, as shown inconfig/dataset_0.json.

For more detailed description of "MedicalSegmentation Decathlon Challenge 2018,"please see http://medicaldecathlon.com/.

Page 45: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 41

Segmentation model Description

The training dataset isTask08_HepaticVessel.tar from the linkabove.

The data must be converted to 1mmresolution before training: tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii -o ${DESTINATION_IMAGE_ROOT}

to match up with the defaultsetting, we suggest that${DESTINATION_IMAGE_ROOT}match DATA_ROOT as defined inenvironment.json in this MMAR'sconfig folder.

The provided training configurationrequired 12GB GPU memory.

Data Conversion: convert to resolution1mm x 1mm x 1mm.

Model Input Shape: dynamic

Training Script: train.sh

Model input and output:

‣ Input: 1 channel CT image‣ Output: 3 channels:

‣ Label 1: hepatic vessel‣ Label 2: liver tumor‣ Label 0: everything else

This Dice scores on the validation dataachieved by this model are:

‣ Hepatic vessel: 0.523‣ Liver tumor: 0.422

Spleen segmentation

‣ segmentation_ct_spleen A pre-trained model for volumetric (3D)segmentation of the spleen from CTimage.

This model is trained using the runner-up awarded pipeline of the "MedicalSegmentation Decathlon Challenge 2018"with 32 training images and 9 validationimages.

Page 46: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 42

Segmentation model Description

The training dataset is Task09_Spleen.tarfrom http://medicaldecathlon.com/.

The data must be converted to 1mmresolution before training: tlt-dataconvert -d ${SOURCE_IMAGE_ROOT} -r 1 -s .nii.gz -e .nii.gz -o ${DESTINATION_IMAGE_ROOT}

To match up with the defaultsetting, we suggest that${DESTINATION_IMAGE_ROOT}match DATA_ROOT as defined inenvironment.json in this MMAR'sconfig folder.

The training was performed withcommand train_2gpu.sh, whichrequired 12GB-memory GPUs.

Training Graph Input Shape: dynamic

Actual Model Input: 96 x 96 x 96

Model input and output:

‣ Input: 1 channel CT image‣ Output: 2 channels:

‣ Label 1: spleen‣ Label 0: everything else

This model achieves the following Dicescore on the validation data (our own splitfrom the training dataset):

‣ Spleen: 0.951

For details of model architecture, see[1]" (Liu et al.)

[1] Myronenko, Andriy. "3D MRI brain tumor segmentation using autoencoderregularization." International MICCAI Brainlesion Workshop. Springer, Cham, 2018.https://arxiv.org/abs/1810.11654.[2] Xia, Yingda, et al. "3D Semi-Supervised Learningwith Uncertainty-Aware Multi-View Co-Training." arXiv preprint arXiv:1811.12506(2018). https://arxiv.org/abs/1811.12506.[3] Liu, Siqi, et al. "3d anisotropic hybridnetwork: Transferring convolutional features from 2d images to 3d anisotropic volumes."International Conference on Medical Image Computing and Computer-AssistedIntervention. Springer, Cham, 2018. https://arxiv.org/abs/1711.08580.

Page 47: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 43

6.2. Classification models

Chest X-ray Classification

classification_chestxray A pre-trained densenet121 model fordisease pattern detection in chest x-rays.

This model is trained using PLCOtraining data and evaluated on the PLCOvalidation data.

You can apply for access to the dataset at:https://biometry.nci.nih.gov/cdas/learn/plco/images/

The provided training configurationrequired 12GB-memory GPUs. Thetraining was performed with commandtrain.sh, which required 12GB-memoryGPUs.

Training Graph Input Shape: 256 x 256

Input: 16-bit CXR png

Output: 15 binary labels, each bit iscorresponding to the prediction of'Nodule', 'Mass', 'Distortion of PulmonaryArchitecture', 'Pleural Based Mass','Granuloma', 'Fluid in Pleural Space','Right Hilar Abnormality', 'Left HilarAbnormality', 'Major Atelectasis','Infiltrate', 'Scarring', 'Pleural Fibrosis','Bone/Soft Tissue Lesion', 'CardiacAbnormality', 'COPD'

Please refer to "medical/segmentation/examples/brats/tutorial_brats.ipynb"inside the docker and the files in the samefolder for details.

This model achieves the following AUCscore on the validation data:

‣ Averaged AUC over all diseasecategories: 0.8680

Model folder inside the docker:

‣ /opt/nvidia/medical/classification/examples/PLCO

Page 48: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 44

Chest X-ray Classification

This model achieves the following Dicescore on the validation data:

This model achieves the following Dicescore on the validation data: AveragedAUC over all disease categories: 0.8587.

6.3. Data transforms and augmentationsHere is a list of built-in data transformation functions. If you need additionaltransformation functions, please contact us at the TLT user forum: http://devtalk.nvidia.com.

Transforms Description

LoadNifty Load NIfTI data. The value of each key(specified by fields) in input "dict" can be astring (a path to a single NIfTI file) or a listof strings (several paths to multiple NIfTIfiles, if there are several channels saved asseparate files).

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

‣ Returns:

- Each field of "dict" is substituted by a4D numpy array.

VolumeTo4dArray Transforms the value of each key(specified by fields) in input "dict" from3D to 4D numpy array by expanding onechannel, if needed.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

‣ Returns:

- Each field of "dict" is substituted by a4D numpy array.

Page 49: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 45

Transforms Description

ScaleIntensityRange Scale the Intensity range with optionalclipping of the of the numpy array.

‣ init_args:

- field: string

one key_value to apply, e.g. "image".

- a_min: float

- a_max: float

Range of the original image

- b_min: float

- b_max: float

Target range of the image.

- Clip = False: bool

Flag controls whether to clip theintensity out of the target range.

ScaleIntensityOscillation Randomly shift scale level for image.

‣ Args:

‣ field: key string, e.g. "image". ‣magnitude: scale shift is a randomvalue random between 0 andmagnitude.

‣ Returns

‣ Data with an offset on intensityscale.

CropSubVolumeBatchPosNegRatio Randomly crop the foreground and

background ROIs from both the imageand mask for training. The sampling ratiobetween positive and negative samples isadjusted with the epoch number.

‣ init_args:

- image_field: string

one key_value to apply, e.g. "image".

- label_field: string

one key_value to apply, e.g. "label".

Page 50: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 46

Transforms Description

- size: list of ints

cropped ROI size, e.g., [96, 96, 96].

- pos/neg: float

Positive numbers to determine the ratiobetween positive and negative samples.

- batch_size: int

A positive integer to determine howmany patches are cropped from the singlevolume.

- data_format: string

”channels_first” (by default) or”channels_last”.

- fast_crop: binary

True or False to determine whether toenable the fast cropping algorithm.

‣ Returns:

- Updated dictionary with cropped ROIimage and mask

NPResize3D Resize 3D volume with a given shape

‣ init_args:

- applied_keys: list of strings

key_values to apply, e.g. ["image",“label”].

- output_shape: list of ints

output size, e.g., [96, 96, 96].

- nearest: binary

True for nearest interpolation (forsegmentation labels, etc.), and False forlinear interpolation (for images, etc.).

‣ Returns:

- 3D volume with the given shape

TransformVolumeCropROI

FastPosNegRatio

Fast 3D data augmentation method (CPUbased) by combining 3D morphologicaltransforms (rotation, elastic deformation,and scaling) and ROI cropping. Thesampling ratio is specified by pos/neg.

Page 51: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 47

Transforms Description

‣ init_args:

- applied_keys: string or list of strings

key_values to apply, e.g. ["image","label"].

- size: list of int

cropped ROI size, e.g., [96, 96, 96].

- deform: boolean

whether to apply 3D deformation.

- rotation: boolean

whether to apply 3D rotation.

- rotation_degree: float

the degree of rotation, e.g., 15 meansrandomly rotate the image/label in arange [-15, +15].

- scale: boolean

whether to apply 3D scaling.

- scale_factor: float

the percentage of scaling, e.g., 0.1means randomly scaling the image/label in a range [-0.1, +0.1].

- pos: float

the factor controlling the ratio ofpositive ROI sampling.

- neg: float

the factor controlling the ratio ofnegative ROI sampling.

‣ Returns:

- Updated dictionary with croppedROI image and mask after dataaugmentation.

AdjustContrast Randomly adjust the contrast of the field ininput "dict".

‣ init_args:

- field: string

Page 52: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 48

Transforms Description

one key_value to apply, e.g. "image".

AddGaussianNoise Randomly add Gaussian noise to the fieldin input "dict".

‣ init_args:

- field: string

one key_value to apply, e.g. "image".

LoadPng Load png image and the label. The valueof "image" must be a string (a path toa single png file) while the value of the"label" must be a list of labels.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

‣ Returns:

- "image" of "dict" is substituted by a3D numpy array while the "label" of"dict" is substituted by a numpy list

CropRandomSubImageInRange Randomly crop 2D image. The crop sizeis randomly selected between lower_sizeand image size.

‣ init_args:

- lower_size: int or float

lower limit of crop size, if float, thenmust be fraction <1

- max_displacement: float

max displacement from center to crop

- keep_aspect: boolean

if true, then original aspect ratio iskept

‣ Returns:

- The "image" field of input "dict" issubstituted by cropped ROI image.

Page 53: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 49

Transforms Description

NPResizeImage Resize the 2D numpy array (channel xrows x height) as an image.

‣ init_args:

- applied_keys: string

one key_value to apply, e.g. "image".

- output_shape: list of int with length 2

e.g., [256,256].

- data_format: string

''channels_first', 'channels_last', or'grayscale'.

NP2DRotate Rotate 2D numpy array, or channelled 2Darray. If random is set to true, then rotatewithin the range of [ -angle, angle]

‣ init_args:

- applied_keys: string

one key_value to apply, e.g. "image".

- angle: float

e.g. 7.

- random: boolean

default is false.

NPExpandDims Add a singleton dimension to the selectedaxis of the numpy array.

‣ init_args:

- applied_keys: string or list of strings

key_values to apply, e.g. ["image","label"].

- expand_axis: int

axis to expand, default is 0

NPRepChannels Repeat a numpy array along specifiedaxis, e.g., turn a grayscale image into a 3-channel image.

‣ init_args:

- applied_keys: string or list of strings

Page 54: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 50

Transforms Description

key_values to apply, e.g. ["image","label"].

- channel_axis: int

the axis along which to repeat values.

- repeat: int

the number of repetitions for eachelement.

CenterData Center numpy array's value by subtractinga subtrahend and dividing by a divisor.

‣ init_args:

- applied_keys: string or list of strings

key_values to apply, e.g. ["image","label"].

- subtrahend: float

subtrahend. If None, it is computed asthe mean of dict[key_value].

- divisor: float

divisor. If None, it is computed as thestd of dict[key_value]

NPRandomFlip3D Flip the 3D numpy array along randomaxes with the provided probability.

‣ init_args:

- applied_keys: string or list of strings

key_values to apply, e.g. ["image","label"].

- probability: float

probability to apply the flip, valuebetween 0 and 1.0.

NPRandomZoom3D Apply a random zooming to the 3Dnumpy array.

‣ init_args:

- applied_keys: string or list of strings

key_values to apply, e.g. ["image","label"].

Page 55: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 51

Transforms Description

- lower_limits: list of float

lower limit of the zoom along eachdimension.

- upper_limits: list of float

upper limit of the zoom along eachdimension.

- data_format: string

'channels_first' or "channels_last".

- use_gpu: boolean

whether to use cupy for GPUacceleration. Default is false.

- keep_size: boolean

default is false which means thisfunction will change the size of thedata array after zooming. Settingkeep_size to True will result in anoutput of the same size as the input.

CropForegroundObject Crop the 4D numpy array and resize theforground. The numpy array must haveforeground voxels.

‣ init_args:

- size: list of int

resized size.

- image_field: string

"image".

- label_field: string

"label".‣ - pad: int number of voxels for

addinga margin around the object)-foreground_only: boolean whethertotreat all foreground labels asonebinary label (default) or whethertoselect foreground label at random.

‣ - keep_classes : boolean; if true, keeporiginal label indices in label image(no thresholding), useful for multi-class tasks.

Page 56: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 52

Transforms Description

‣ - pert: int; random perturbation ineach dimension added to padding (invoxels).

NPRandomRot90_XY Rotate the 4D numpy array along randomaxes on XY plane (axis = (1, 2)).

‣ init_args:

- applied_keys: string

one key_value to apply, e.g. "image".

- probability: float

probability to utilize the transform,between 0 and 1.0.

AddExtremePointsChannel Add and additional channel to 4Dnumpy array where the extreme pointsof the foreground labels are modeled asGaussians.

‣ init_args:

- image_field: string

"image".

- label_field: string

"label".

- sigma: float

size of Gaussian.

- pert: boolean

random perturbation added to theextreme points.

NormalizeNonzeroIntensities Normalize 4D numpy array to zero meanand unit std, based on non-zero elementsonly for each input channel individually.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

SplitAcrossChannels Splits the 4D numpy array across channelsto create new dict entries. New key_values

Page 57: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 53

Transforms Description

shall be applied_key+channel number, e.g."image1".

‣ init_args:

- applied_key: string

one key_value to apply, e.g. "image".

LoadResolutionFromNifty Get the image resolution from an NifTIimage

‣ init_args:

- applied_key: string

one key_value to apply, e.g. "image".

‣ Returns:

- "dict" has a new key-value pair:dict[applied_key+"_resolution"]:resolution of the NIfTI image

Load3DShapeFromNumpy Get the image shape from an NifTI image

‣ init_args:

- applied_key: string

one key_value to apply, e.g. "image".

‣ Returns:

- "dict" has a new key-value pair:dict[applied_key+"_shape"]: shape ofthe NIfTI image

ResampleVolume Resample the 4D numpy array fromcurrent resolution to a specific resolution

‣ init_args:

- applied_key: string

one key_value to apply, e.g. "image".

- resolution: list of float

input image resolution.

- target_resolution: list of float

target resolution.

BratsConvertLabels Brats data specific. Convert input labelsformat (indices 1,2,4) into proper format.

Page 58: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 54

Transforms Description

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image", "label"].

CropSubVolumeRandomWithinBounds Crops a random subvolume from withinthe bounds of 4D numpy array.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

- size: list of int

the size of the crop region e.g.[224,224,128].

FlipAxisRandom Flip the numpy array along its dimensionsrandomly.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

- axis : list of ints

which axes to attempt to flip (e.g.[0,1,2] - for all 3 dimensions) - the axisindices must be provided only forspatial dimensions.

CropSubVolumeCenter Crops a center subvolume from within thebounds of 4D numpy array.

‣ init_args:

- fields: string or list of strings

key_values to apply, e.g. ["image","label"].

- size: list of int

the size of the crop regione.g. [224,224,128] (similar toCropSubVolumeRandomWithinBounds,but crops the center)

Page 59: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 55

6.4. Model training and validation configurationsTransfer Learning workflows are made of different types of components. For eachtype, there are usually multiple choices. To put together a workflow, you specify andconfigure the components to be used.

Transfer Learning offers two kinds of workflows: training and validation. Workflowconfigurations are defined in JSON files: config_train.json for training workflow andconfig_validation.json for validation.

6.4.1. Train configurationTraining config file config_train.json defines the configuration of the training workflow.The config contains three sections: global variables, train, and validate.

You can define global variables in the configuration JSON file. These variables can beoverwritten through the environment.json or even command line. The typical globalvariables include:

{"epochs": 5000,"num_training_epoch_per_valid": 20,"learning_rate": 1e-4, "multi_gpu": false,…}

By overwriting the values of these variables through command line, you can experimentwith different training settings without having to modify the config file. For example: incommands/train.shpython3 -u -m medical.tlt2.src.apps.train \

-m $MMAR_ROOT \ -c $CONFIG_FILE \ -e $ENVIRONMENT_FILE \ --set \ epochs=1260 \ learning_rate=0.0001 \ num_training_epoch_per_valid=20 \ multi_gpu=false

The “train” section defines the components for the training process, including "loss",“optimizer”, “lr_policy”, “model”, “pre_transforms” and “image_pipeline”. Eachcomponent is constructed by providing the component’s class “name” and the initarguments “args”.

Similarly, the “validate” section defines the components for validation process, including“metrics”, “pre_transforms”, “image_pipeline” and “inferer”. Each componentis constructed the same way by providing the component class “name” and thecorresponding init arguments “args”.

Page 60: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 56

If you want to use an externally-implemented component class, you can do so byspecifying the class “path” to replace the “name”.

6.4.1.1. Segmentation model example

Here is an example of config_train.json of the segmentation_ct_spleen model:

{

"epochs": 1250,

"num_training_epoch_per_valid": 20,

"learning_rate": 1e-4,

"multi_gpu": false,

"train": {

"loss": {

"name": "Dice"

},

"optimizer": {

"name": "Adam"

},

"lr_policy": {

"name": "DecayLRonStep",

"args": {

"decay_ratio": 0.1,

"decay_freq": 50000

}

},

"model": {

"name": "SegmAhnet3D",

"args": {

"num_classes": 2,

"if_from_scratch": false,

"if_use_psp": false,

"pretrain_weight_name": "{PRETRAIN_WEIGHTS_FILE}",

"plane": "z",

"final_activation": "softmax"

}

Page 61: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 57

},

"pre_transforms": [

{

"name": "LoadNifty",

"args": {

"fields": [

"image",

"label"

]

}

},

{

"name": "VolumeTo4DArray",

"args": {

"fields": [

"image",

"label"

]

}

},

{

"name": "ScaleIntensityRange",

"args": {

"field": "image",

"a_min": -57,

"a_max": 164,

"b_min": 0.0,

"b_max": 1.0,

"clip": true

}

},

{

"name": "CropSubVolumeBatchPosNegRatio",

Page 62: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 58

"args": {

"size": [

96,

96,

96

],

"image_field": "image",

"label_field": "label",

"pos": 1,

"neg": 1,

"batch_size": 3,

"fast_crop": true

}

},

{

"name": "NPRandomFlip3D",

"args": {

"applied_keys": [

"image",

"label"

],

"probability": 0.0

}

},

{

"name": "NPRandomRot90XY",

"args": {

"applied_keys": [

"image",

"label"

],

"probability": 0.0

}

},

Page 63: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 59

{

"name": "ScaleIntensityOscillation",

"args": {

"field": "image",

"magnitude": 0.10

}

}

],

"image_pipeline": {

"name": "ImagePipeline",

"args": {

"task": "segmentation",

"data_list_file_path": "{DATASET_JSON}",

"data_file_base_dir": "{DATA_ROOT}",

"data_list_key": "training",

"crop_size": [

-1,

-1,

-1

],

"data_format": "channels_first",

"batch_size": 0,

"num_channels": 1,

"num_workers": 4,

"prefetch_size": 0

}

}

},

"validate": {

"metrics": [

{

"name": "MetricAverageFromArrayDice",

"args": {

Page 64: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 60

"name": "mean_dice",

"stopping_metric": true,

"applied_key": "model",

"label_key": "label"

}

}

],

"pre_transforms": [

{

"name": "LoadNifty",

"args": {

"fields": [

"image",

"label"

]

}

},

{

"name": "VolumeTo4DArray",

"args": {

"fields": [

"image",

"label"

]

}

},

{

"name": "ScaleIntensityRange",

"args": {

"field": "image",

"a_min": -57,

"a_max": 164,

"b_min": 0.0,

"b_max": 1.0,

Page 65: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 61

"clip": true

}

}

],

"image_pipeline": {

"name": "ImagePipeline",

"args": {

"task": "segmentation",

"data_list_file_path": "{DATASET_JSON}",

"data_file_base_dir": "{DATA_ROOT}",

"data_list_key": "validation",

"crop_size": [

-1,

-1,

-1

],

"data_format": "channels_first",

"batch_size": 1,

"num_channels": 1,

"num_workers": 4,

"prefetch_size": 0

}

},

"inferer": {

"name": "ScanWindowInferer",

"args": {

"is_channels_first": true,

"roi_size": [

160,

160,

160

]

}

Page 66: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 62

}

}

}

Explanation of components in the Training workflow:

Section Component Description

Global epochs Number of training epochs

num_training_epoch

_per_valid

Validation frequency innumber of epochs. If notspecified, defaults to 1.

learning_rate The initial learning rate

multi_gpu Is the training on multipleGPUs? If not specified,defaults to false.

train loss The loss component. TheDice loss is used here.

optimizer The optimizer component.The Adam optimizer isused here.

lr_policy The learning rate policy.The DecayLRonStep policyis used here.

model The model network. TheSegmAhnet3D

network is used here.

pre_transforms List of transforms to beapplied to the training data.

image_pipeline The image pipeline thatgenerates batched trainingdata items. NOTE: Thecrop_size is [-1, -1, -1] fortraining with dynamicnetwork input shape. Thebatch_size is set to 0. This isbecause the batching is notdone by the image pipeline;instead it is done by thespecial transform

CropSubVolumeBatch

PosNegRatio

Page 67: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 63

Section Component Description

for faster cropping andbatching.

validate metrics

pre_transforms Transforms to be applied tothe validation data.

image_pipeline The image pipelinethat generates batchedvalidation data items.

inferer The inferer to be used forperforming inference onvalidation data. Options areScanWindowInferer andSimpleInferer

Dynamic Network Input Shape

TensorFlow 1.13 supports dynamic network input shapes. This allows the computationgraph to be built with placeholders of dynamic shape [None, None, None], which canaccept input data of any size. This makes it possible for dynamically computing the bestinput size for an image to obtain the best performance during inference.

Because the network input size is dynamic, when performing inference, you mustexplicitly set the actual input size. In this example, we set the ScanWindowInferer’sroi_size to [160, 160, 160], which is used as the actual input size to the network, whereROI stands for Region Of Interest.

6.4.1.2. Classification model example

Here is the example for the classification_chestxray:

{

"epochs": 40,

"multi_gpu": false,

"learning_rate": 2e-4,

"train": {

"model": {

"name": "DenseNet121",

"args": {

"weight_decay": 1e-5,

"pretrain_weight_name": "{PRETRAIN_WEIGHTS_FILE}"

}

},

Page 68: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 64

"loss": {

"name": "ClassificationLoss"

},

"optimizer": {

"name": "Adam"

},

"pre_transforms": [

{

"name": "LoadPng",

"args": {

"fields": [

"image"

]

}

},

{

"name": "CropRandomSubImageInRange",

"args": {

"lower_size": [

0.9,

0.9

],

"data_format": "grayscale",

"image_field": "image",

"max_displacement": 200

}

},

{

"name": "NPResizeImage",

"args": {

"applied_keys": [

"image"

],

Page 69: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 65

"output_shape": [

256,

256

],

"data_format": "grayscale"

}

},

{

"name": "NP2DRotate",

"args": {

"applied_keys": [

"image"

],

"angle": 7,

"random": true,

"data_format": "grayscale"

}

},

{

"name": "NPExpandDims",

"args": {

"applied_keys": "image",

"expand_axis": 2

}

},

{

"name": "NPRepChannels",

"args": {

"applied_keys": "image",

"channel_axis": 2,

"repeat": 3

}

},

{

Page 70: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 66

"name": "CenterData",

"args": {

"applied_keys": "image",

"subtrahend": [

2876.37,

2876.37,

2876.37

],

"divisor": [

883,

883,

883

]

}

}

],

"metrics": [

{

"name": "AccuracyComputer",

"args": {

"tags": "accuracy",

"use_sigmoid": true

}

},

{

"name": "ClassificationMetric",

"args": {

"binary_preds_name": "binary_preds",

"binary_labels_name": "binary_labels"

},

"do_summary": false,

"do_print": false

}

Page 71: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 67

],

"image_pipeline": {

"name": "ImagePipeline",

"args": {

"task": "classification",

"data_list_file_path": "{DATASET_JSON}",

"data_file_base_dir": "{DATA_ROOT}",

"data_list_key": "training",

"crop_size": [

256,

256

],

"data_format": "channels_last",

"batch_size": 20,

"num_channels": 3,

"num_workers": 8,

"prefetch_size": 21

}

}

},

"validate": {

"pre_transforms": [

{

"name": "LoadPng",

"args": {

"fields": [

"image"

]

}

},

{

"name": "NPResizeImage",

"args": {

"applied_keys": [

Page 72: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 68

"image"

],

"output_shape": [

256,

256

],

"data_format": "grayscale"

}

},

{

"name": "NPExpandDims",

"args": {

"applied_keys": "image",

"expand_axis": 2

}

},

{

"name": "NPRepChannels",

"args": {

"applied_keys": "image",

"channel_axis": 2,

"repeat": 3

}

},

{

"name": "CenterData",

"args": {

"applied_keys": "image",

"subtrahend": [

2876.37,

2876.37,

2876.37

],

Page 73: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 69

"divisor": [

883,

883,

883

]

}

}

],

"metrics": [

{

"name": "MetricAverage",

"args": {

"name": "mean_accuracy",

"applied_key": "val_accuracy"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Average_AUC",

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels",

"auc_average": "macro",

"stopping_metric": true

}

},

{

"name": "MetricAUC",

"args": {

"name": "Nodule",

"class_index": 0,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

Page 74: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 70

},

{

"name": "MetricAUC",

"args": {

"name": "Mass",

"class_index": 1,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Distortion_pulmonary_architecture",

"class_index": 2,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Pleural_based_mass",

"class_index": 3,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Granuloma",

"class_index": 4,

Page 75: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 71

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Fluid_in_pleural_space",

"class_index": 5,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Right_hilar_abnormality",

"class_index": 6,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Left_hilar_abnormality",

"class_index": 7,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

Page 76: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 72

"name": "Major_atelectasis",

"class_index": 8,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Infiltrate",

"class_index": 9,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Scarring",

"class_index": 10,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Pleural_fibrosis",

"class_index": 11,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

Page 77: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 73

{

"name": "MetricAUC",

"args": {

"name": "Bone_soft_tissue_lesion",

"class_index": 12,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "Cardiac_abnormality",

"class_index": 13,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

},

{

"name": "MetricAUC",

"args": {

"name": "COPD",

"class_index": 14,

"applied_key": "val_binary_preds",

"label_key": "val_binary_labels"

}

}

],

"image_pipeline": {

"name": "ImagePipeline",

"args": {

"task": "classification",

"data_list_file_path": "{DATASET_JSON}",

"data_file_base_dir": "{DATA_ROOT}",

Page 78: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 74

"data_list_key": "validation",

"crop_size": [

256,

256

],

"data_format": "channels_last",

"batch_size": 20,

"num_channels": 3,

"num_workers": 8,

"prefetch_size": 21

}

},

"inferer": {

"name": "SimpleInferer"

}

}

}

Section Component Description

global epochs Number of training epochs

learning_rate Initial learning rate

multi_gpu Is the training on multipleGPUs? If not specified,defaults to false.

train model The model networkcomponent.

loss The loss component

optimizer The optimizer component

pre_transforms List of transforms to beperformed to the trainingdata

metrics Metrics to be computedduring validation

Page 79: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 75

Section Component Description

image_pipeline The image pipeline thatproduces batched trainingdata

validate pre_transforms List of transforms to beperformed to the validationdata

metrics Metrics to be computedduring validation

image_pipeline The image pipelinethat produces batchedvalidation data

inferer The component that doesinference on validationdata.

6.4.2. Validation configurationValidation config file config_validation.json defines the configuration of the validationworkflow.

6.4.2.1. Segmentation model example

Here is the validation config of segmentation_ct_spleen:

{

"batch_size": 1,

"pre_transforms":

[

{

"name": "LoadResolutionFromNifty",

"args": {

"applied_key": "image",

"new_key": "image_resolution"

}

},

{

"name": "LoadNifty",

"args": {

Page 80: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 76

"fields": "image"

}

},

{

"name": "Load3DShapeFromNumpy",

"args": {

"applied_key": "image",

"new_key": "image_shape"

}

},

{

"name": "ResampleVolume",

"args": {

"applied_key": "image",

"resolution": "image_resolution",

"target_resolution": [

1.0,

1.0,

1.0

]

}

},

{

"name": "VolumeTo4DArray",

"args": {

"fields": "image"

}

},

{

"name": "ScaleIntensityRange",

"args": {

"field": "image",

"a_min": -57,

"a_max": 164,

Page 81: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 77

"b_min": 0.0,

"b_max": 1.0,

"clip": true

}

}

],

"post_transforms":

[

{

"name": "ArgmaxAcrossChannels",

"args": {

"applied_key": "model"

}

},

{

"name": "NPResize3D",

"args": {

"applied_keys": "model",

"output_shape_key": "image_shape",

"nearest": true

}

},

{

"name": "SplitBasedOnLabel",

"args": {

"applied_key": "model",

"channel_names": [

"pred_class0",

"pred_class1"

]

}

}

],

Page 82: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 78

"writers":

[

{

"name": "NiftyWriter",

"args": {

"applied_key": "model",

"dtype": "uint8",

"write_path": "{MMAR_EVAL_OUTPUT_PATH}"

}

},

{

"name": "NiftyWriter",

"args":

{

"applied_key": "pred_class0",

"dtype": "uint8",

"write_path": "{MMAR_EVAL_OUTPUT_PATH}"

}

},

{

"name": "NiftyWriter",

"args":

{

"applied_key": "pred_class1",

"dtype": "uint8",

"write_path": "{MMAR_EVAL_OUTPUT_PATH}"

}

}

],

"label_transforms":

[

{

Page 83: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 79

"name": "LoadNifty",

"args": {

"fields": "label"

}

},

{

"name": "SplitBasedOnLabel",

"args": {

"applied_key": "label",

"channel_names": [

"label_class0",

"label_class1"

]

}

}

],

"val_metrics":

[

{

"name": "MetricAverageFromArrayDice",

"args": {

"name": "mean_dice",

"applied_key": "pred_class1",

"label_key": "label_class1",

"report_path": "{MMAR_EVAL_OUTPUT_PATH}"

}

}

],

"inferer":

{

"name": "ScanWindowInferer",

"args": {

"is_channels_first": true,

Page 84: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 80

"roi_size": [160, 160, 160],

"batch_size": 3

}

},

"model_loader":

{

"name": "FrozenGraphModelLoader",

"args": {

"model_file_path": "{MMAR_CKPT_DIR}/model.trt.pb"

}

}

}

Components Description

batch_size Size of validation data batch

pre_transforms Transforms to be applied to validationsample images before predictioncomputation

post_transforms Transforms to be applied to validationsample images after predictioncomputation

label_transforms Transforms to be applied to validationlabel images

val_metrics Validation metrics to be computed andreported

writers Writers that write prediction results to files

inferer The component that does predictioncomputation

model_loader The component that loads the pre-trainedmodel

6.4.2.2. Converting from the old format

This release of the Ttransfer Learning Toolkit strictly follows a component-orientedapproach. Each component is completely configured by its set of init parameters or“args”.

Page 85: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 81

The first release, EA (Early Access), configuration format is mostly component orientedbut not strictly so. Some parameters are defined outside of the components. Forexample, num_classes is a parameter of the model, but it is defined outside of the modelcomponent definition.

Another difference compared to the previous release's configuration is the lack ofseparation between class name and init parameters: the “name” parameter is used asthe class name of the component, and all other parameters are treated as the init args ofthe component. We have found that some components (e.g. metrics) also use “name” asone of their init args. The EA release required a workaround use “tag” as the “name”parameter for init arg, and then when processing the component, special code is neededto change “tag” back to “name” before initializing the component.

This release separates class name from init args: just as in the Early Access release,“name” is used as the class name of the component; but all init args are placed withinthe “args” attribute to avoid potential conflict of parameter names.

6.5. ComponentsThis section lists all currently available components implemented in Clara Train SDK.These components can be used in workflow configuration, as shown in section 6.4.

Type Name Description

Model SegmAhnet3D A 3D segmentation model.

Args:

num_classes,

if_use_psp,

plane,

pretrain_weight_name,

final_activation='softmax',data_format='channels_first'

SegResnet A 3D segmentation model.

Args:

num_classes,

blocks_down='1,2,2,4',

blocks_up='1,1,1',

init_filters=8,

use_batch_norm=False,

use_group_norm=True,

use_group_normG=8,

Page 86: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 82

Type Name Description

reg_weight=0.0,

dropout_prob=0.0,

final_activation='softmax',

use_vae=False,

data_format='channels_first'

DenseNet121 A 2D classification model.

Args:

weight_decay,

pretrain_weight_name

Loss Dice A loss function with dicealgorithm.

Args:

data_format='channels_first',

skip_background=False,

squared_pred=False,

jaccard=False,

smooth=1e-5,

top_smooth=0.0,

is_onehot_targets=False

ClassificationLoss A loss function forclassification models

Optimizer Adam This is a wrapper of

tf.train.AdamOptimizer

SGD This is a wrapper oftf.train.MomentumOptimizer

Data Pipeline ImagePipeline Produce batched data fortraining and validation.

Args:

task,

data_list_file_path,

data_file_base_dir,

data_list_key,

Page 87: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 83

Type Name Description

crop_size,

transforms,

data_format=”channels_first”,

num_data_dims=3,

num_channels=1,

num_label_channels=1,

batch_size=10,

num_workers=4,

prefetch_size=20,

shuffle=True,

repeat=True

Train Metric AccuracyComputer Compute accuracy basedon prediction and label.

Args:

tags: name of the accuracytensor

use_sigmoid=False

DiceMaskedOutput Compute dice mask basedon prediction and label.

Args:

tags,

data_format='channels_first',

skip_background=False,

is_onehot_targets=False,

is_independent_predictions

=False,

jaccard=False,

threshold=0.5

DiceMetric Compute dice value basedon prediction and label.

Args:

data_format ='channels_first',

Page 88: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 84

Type Name Description

skip_background=False,

is_onehot_targets=False,

is_independent_predictions

=False,

jaccard=False,

threshold = 0.5

Validation Metric MetricAverageFromArrayDiceComputes dice score metricfrom full size np array andcollects average.

Args:

applied_key,

name,

label_key='label',

do_print=True,

do_summary=True,stopping_metric=False,report_path=None

MetricAUC Computes AUC. Usuallyused for classificationmodel validation.

Args:

applied_key,

name,

label_key,

do_print=True,

do_summary=True,

stopping_metric=False,report_path=None,auc_average='macro',

class_index=None

MetricAverage Generic class for trackingaverages of metrics. Expectsthat the applied_key is ascalar value that will beaveraged.

Page 89: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 85

Type Name Description

Args:

applied_key,

name,

do_print=True,

do_summary=True,stopping_metric=False,report_path=None

Model Loader CheckpointLoader Load a model in checkpointformat.

Args:

checkpoint_dir,

input_node_names=None,

output_node_names=None,

checkpoint_file_prefix

='model.ckpt

FrozenGraphModelLoader Load a model in frozengraph format.

Args:

model_file_path,

input_node_names=None,

output_node_names=None

Writer NiftyWriter Write inference result asNIFTY image.

Args:

applied_key,

write_path,

compressed=True,

dtype="float32",

use_identity=False

ClassificationResultWriter Write classification results.

Args:

applied_key,

Page 90: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 86

Type Name Description

write_path,

overwrite=True

Learning Rate Policy DecayLRonStep Class for decaying learningrate based on the number of

steps. This policy decaysthe Learning rate by`decay_ratio`

specified every `decay_freq`steps.

Args:

decay_ratio,

decay_freq

ReduceLRonPlateau Class for reducing learningrates on plateau policy. Willreduce learning rate after aplateau has been reached acert number

of times.

Args:

plateau_count_trigger,reduction_rate

ReduceLRPoly Class for reducing learningrate based on the epochprogress: lr = lr_init * (1 - e /total_epoch) ** poly_power.

Args:

poly_power

ReduceLRCosine Class for reducing learningrate based on the epochprogress

lr = lr_init * cos(0.5*pi* e /total_epoch).

Args:

poly_power

Page 91: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 87

Type Name Description

Inferer SimpleInferer Do inference by simplyfeeding the whole image tothe network

ScanWindowInferer Scan the image into slices ofspecified ROI size, and thendo inference on the slices.

Args:

roi_size,

is_channels_first=True,batch_size=1

6.6. Training with multiple GPUsHow It Works

TLT’s multi-GPU training is based on Horovod (https://github.com/horovod/horovod). Itworks as follows:

‣ To train with N GPUs, N processes running exactly the same code are used. Eachprocess is pinned to a GPU.

‣ The Optimizer is made into Distributed Optimizer (by calling a horovod function).‣ Horovod synchronizes the gradients across all processes at each training step. For

this to work, all processes have identical number of training steps.

Transfer Learning training uses two datasets: training dataset for minimizing loss, andvalidation dataset for validating the model to obtain the best model. In multi-GPUtraining, both datasets are sharded such that each process only takes a portion of theload.

1. Training dataset sharding. The training dataset is divided among the numberof GPUs. This is the main reason for reduced total training time - the number oftraining steps for each process/GPU is only 1/N of the total, where N is the numberof GPUs. Since Horovod synchronizes the training processes at each step, thesharding algorithm makes sure that all shards have the same size: if the dataset sizeis not divisible by N, it adds the 1st element in the dataset to the short shards. Atthe beginning of each epoch, the content of each shard is shuffled globally such thateach process gets to see the whole picture of the training dataset over time.

2. Validation dataset sharding. The same algorithm is applied to the validation dataset,except that each shard does not need to be equal size.

When computing validation metrics, results by individual processes are aggregatedusing MPI’s gather function.

Training Parameters

It can be difficult to set up the training parameters properly with multi-GPU training.

Page 92: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 88

‣ Batch Size - The value of batch size is constrained by the GPU memory. You have tochoose a batch size that is acceptable by all GPUs, if your GPUs don’t have the sameamount of available memory.

‣ Learning Rate - The value of learning rate is closely related to the number of GPUsand batch size. According to horovod, as the rule of thumb, you should scale upthe learning rate by the number of GPUs. For example, suppose your LR for singleGPU is 0.0001, you could start with a LR of 0.0002 when training with 2 GPUs. But itrequires some experimentation to obtain the best LR.

You can create your own train_ngpu.sh based on train_2gpu.sh. Make sure you adjustthe learning rate accordingly.

6.7. Improving inference performanceIf your model is trained with dynamic input shape (the Readme.md included in themodel’s MMAR specifies whether the model is trained with dynamic input shape), youmay be able to obtain significantly improved inference performance in both accuracyand speed.

TensorFlow 1.13 supports dynamic network input shapes. This allows the model’scomputation graph to be built with placeholders of dynamic shape [None, None, None],which can accept input data of any size. Our POC shows that inference performancevaries greatly for different input sizes. In general, inference tends to have betterperformance with bigger network input sizes.

This is only true within certain size ranges. When the size goes beyond the range, theoverall speed drops considerably, even though the total number of scanning windowsare smaller. As of now, it is not clear how to accurately determine the upper bound.Only the model SegmAhnet3D has been modified to support dynamic input shape.

6.7.1. Training with dynamic shapeTransfer Learning has been modified to take advantage of training with dynamic shape.To use dynamic network input, you must modify config_train.json of your model,as shown here:

ImagePipeline

Set the crop_size of the two ImagePipeline components to [-1, -1, -1], as shown in thisexample. This sets the placeholder with input shape [None, None, None]. Here's anexample:"image_pipeline": { "name": "ImagePipeline", "args": { "task": "segmentation", "data_list_file_path": "{DATASET_JSON}", "data_file_base_dir": "{DATA_ROOT}", "data_list_key": "training", "crop_size": [-1, -1, -1],

Page 93: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 89

"data_format": "channels_first", "batch_size": 0, "num_channels": 1, "num_workers": 4, "prefetch_size": 0 }}

ScanWindowInferer

ScanWindowInferer can do inference for large images that cannot be fed to the modeldirectly due to its large size. It is implemented with a sophisticated algorithm:

The ScanWindowInferer is first configured with a roi_size (roi = region of interest).Based on the roi_size, it “scans” the image into a set of overlapping patches called slices.It then computes prediction for each slice. It finally computes the overall prediction byaggregating the results from slice predictions.

If your model uses scanning window based inference during validation, you now mustexplicitly set its “roi_size” (see example below). Set it to a size that makes the best senseto your model: it produces good accuracy without going over the bound. In general, thissize should be no less than the size of training crops. For SegmAhnet3D based models,the size must be divisible by 32. Here's an example:

"Inferer": { "name": "ScanWindowInferer", "args": { "is_channels_first": true, "roi_size": [160, 160, 160] }}

Do NOT change crop size of any transforms for training. They decide the actual inputsize of the crops into the network for training.

6.7.2. Inference models trained with dynamic shapeTo validate or inference with a model trained with dynamic input shape, you must alsomodify the ScanWindowInferer configuration in config_validation.json, byexplicitly specifying its roi_size. You can be a little more generous here since you havemore GPU memory to work with during validation and inference. To obtain optimalperformance (higher accuracy with faster speed), you should experiment with differentROI sizes. Here's an example:"Inferer": { "name": "ScanWindowInferer", "args": { "is_channels_first": true, "roi_size": [224, 224, 224], }}

The ScanWindowInferer offers another technique for improving inference speed:batch_size. The basic algorithm computes prediction for each slice one by one.This might not be able to fully utilize the GPU’s computing power. When specifying

Page 94: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Appendix

www.nvidia.comNVIDIA Clara Train SDK: Transfer Learning DU-09302-002 _v2.0 | 90

a batch_size > 1, you compute the predictions of multiple slices in one shot, hencepotentially increasing the overall speed:"Inferer": { "name": "ScanWindowInferer", "args": { "is_channels_first": true, "roi_size": [224, 224, 224], “batch_size”: 2 }}

It is not always true that the bigger the size, the faster the overall inference. It takessome experimentation to determine the best roi_size.

roi_size can cause change to both inference accuracy and speed; whereasbatch_size only cause changes to inference speed (i.e. inference should produceexactly the same accuracy for the same roi_size, regardless of batch_size).

6.7.3. Objectivity of trained modelsThe accuracy of the model is determined by the validation performed during training.With fixed network input shape, both training and validation (which runs inferenceagainst the graph) use the same network input shape. The accuracy of the model istherefore also fixed. However, with dynamic input shapes, training and validation nolonger have to use the same input size. For example, you can use [96, 96, 96] as the cropsize for training, whereas [160, 160, 160] as the ROI size of the ScanWindowInferer forvalidation. Using different ROI sizes for validation could produce different accuracy ofthe trained model.

So the question is how important is the accuracy value produced by the trainingprocess, and whether the quality of the trained model depends on the ROI size of theScanWindowInferer used by validation?

To find answers to these questions, we ran multiple rounds of training with differentROI sizes with deterministic training enabled. All these runs produced the best model atexactly the same epoch with different “best mean dice” values.

Based on the results of these experiments, we conclude:

‣ It appears that the quality of the trained model does not depend on the ROI size forvalidation, even though the accuracy values do vary for different ROI sizes. Thismeans that the trained model is objective.

‣ The accuracy value determined by the training process is still important, but only inrelative sense. You can probably still compare two models and judge which is better,but you should do so with the same ROI size for the validation.

Page 95: NVIDIA Clara Train SDK: Transfer Learning · OVERVIEW NVIDIA’s Clara Train SDK: Transfer Learning toolkit is a python-based SDK that allows developers looking into faster implementation

Notice

THE INFORMATION IN THIS GUIDE AND ALL OTHER INFORMATION CONTAINED IN NVIDIA DOCUMENTATION

REFERENCED IN THIS GUIDE IS PROVIDED “AS IS.” NVIDIA MAKES NO WARRANTIES, EXPRESSED, IMPLIED,

STATUTORY, OR OTHERWISE WITH RESPECT TO THE INFORMATION FOR THE PRODUCT, AND EXPRESSLY

DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A

PARTICULAR PURPOSE. Notwithstanding any damages that customer might incur for any reason whatsoever,

NVIDIA’s aggregate and cumulative liability towards customer for the product described in this guide shall

be limited in accordance with the NVIDIA terms and conditions of sale for the product.

THE NVIDIA PRODUCT DESCRIBED IN THIS GUIDE IS NOT FAULT TOLERANT AND IS NOT DESIGNED,

MANUFACTURED OR INTENDED FOR USE IN CONNECTION WITH THE DESIGN, CONSTRUCTION, MAINTENANCE,

AND/OR OPERATION OF ANY SYSTEM WHERE THE USE OR A FAILURE OF SUCH SYSTEM COULD RESULT IN A

SITUATION THAT THREATENS THE SAFETY OF HUMAN LIFE OR SEVERE PHYSICAL HARM OR PROPERTY DAMAGE

(INCLUDING, FOR EXAMPLE, USE IN CONNECTION WITH ANY NUCLEAR, AVIONICS, LIFE SUPPORT OR OTHER

LIFE CRITICAL APPLICATION). NVIDIA EXPRESSLY DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY OF FITNESS

FOR SUCH HIGH RISK USES. NVIDIA SHALL NOT BE LIABLE TO CUSTOMER OR ANY THIRD PARTY, IN WHOLE OR

IN PART, FOR ANY CLAIMS OR DAMAGES ARISING FROM SUCH HIGH RISK USES.

NVIDIA makes no representation or warranty that the product described in this guide will be suitable for

any specified use without further testing or modification. Testing of all parameters of each product is not

necessarily performed by NVIDIA. It is customer’s sole responsibility to ensure the product is suitable and

fit for the application planned by customer and to do the necessary testing for the application in order

to avoid a default of the application or the product. Weaknesses in customer’s product designs may affect

the quality and reliability of the NVIDIA product and may result in additional or different conditions and/

or requirements beyond those contained in this guide. NVIDIA does not accept any liability related to any

default, damage, costs or problem which may be based on or attributable to: (i) the use of the NVIDIA

product in any manner that is contrary to this guide, or (ii) customer product designs.

Other than the right for customer to use the information in this guide with the product, no other license,

either expressed or implied, is hereby granted by NVIDIA under this guide. Reproduction of information

in this guide is permissible only if reproduction is approved by NVIDIA in writing, is reproduced without

alteration, and is accompanied by all associated conditions, limitations, and notices.

Trademarks

NVIDIA, the NVIDIA logo, and cuBLAS, CUDA, cuDNN, cuFFT, cuSPARSE, DIGITS, DGX, DGX-1, DGX Station,

GRID, Jetson, Kepler, NVIDIA GPU Cloud, Maxwell, NCCL, NVLink, Pascal, Tegra, TensorRT, Tesla and Volta are

trademarks and/or registered trademarks of NVIDIA Corporation in the Unites States and other countries.

Other company and product names may be trademarks of the respective companies with which they are

associated.

Copyright

© 2019 NVIDIA Corporation. All rights reserved.

www.nvidia.com