dask and v100s for fast, distributed batch scoring of ... · •installed plugin for storage. demo...

24
Dask and V100s for Fast, Distributed Batch Scoring of Computer Vision Workloads Mathew Salvaris

Upload: others

Post on 27-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask and V100s for Fast, Distributed Batch Scoring of Computer Vision Workloads

Mathew Salvaris

Page 2: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

• What is Dask?

• Batch scoring use cases• Style transfer• Mask- RCNN for object detection

and segmentation

Page 3: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

http://docs.dask.org/en/latest/why.html

Page 4: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask for batch scoring

• Same code runs on single node locally or in a cluster

• No orchestration code to write, Dask handles orchestration

• Can create and execute complex DAGs that can be generated on the fly

Page 5: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Batch scoring use cases

• Style Transfer

• Object detection and segmentation

ReturnModel Write

Page 6: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Style transfer process

Read

Ap

ply

ReturnModel Write

Page 7: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask Graph of Style Transfer Process

Load Image

Load Image

Load Image

Load Image

Stack

Style Transfer

Write

N

Page 8: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

@currydef process_batch(client, style_model, output_path, batch_filenames):

remote_batch_f = client.scatter(batch_filenames)img_array_f = client.map(load_image, remote_batch_f)stacked_array_f = client.submit(stack, img_array_f)styled_array_f = client.submit(stylize_batch, style_model, stacked_array_f)return client.submit(write, batch, styled_array_f, output_path)

Page 9: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Testing Dask locally on GPU

• Need GPU based libraries for workers and CPU based ones client

• Would limit the visibility of CUDA devices to each worker

• Spawn as many workers as there are GPUs

CUDA_VISIBLE_DEVICES=0 dask-worker 127.0.0.1 --nprocs 1 --nthreads 1 --resources 'GPU=1

Page 10: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally
Page 11: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

What is Azure ML PipelinesCreate experimentation graph that gets you from data to a model

The pipeline can be exported as a parameterised end point

Page 12: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask on Azure pipelines

• Need to set off workers, scheduler and run client

• Azure ML pipelines has MPIStep which allows us to trigger MPI job

• Run workers on all ranks - Run client and scheduler on rank 0

Page 13: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

GPU: V100# Images: 823

180.4

96.7

48.9

25.415.5

0

20

40

60

80

100

120

140

160

180

200

1 Process 2 Processes 4 Processes 8 Processes 12 Processes

Time Taken (S)

4-5 images/second

53 images/second

Page 14: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

180.4

96.7

48.9

25.415.5 12.3

0

20

40

60

80

100

120

140

160

180

200

1 Process 2 Processes 4 Processes 8 Processes 12 Processes

Blob Premium Blob

GPU: V100# Images: 823

Page 15: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

180.4

96.7

48.9

25.415.5

106.8

56.5

33.7

16.812.3

0

20

40

60

80

100

120

140

160

180

200

1 Process 2 Processes 4 Processes 8 Processes 12 Processes

Blob Premium Blob

69 images/second

8 images/second

GPU: V100# Images: 823

Page 16: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Azure ML Pipelines Notebook

Page 17: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Mask-RCNN process

ReturnCombine Write

Read ModelBBOXMask

BBOXMask

Page 18: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask Graph of Mask-RCNN

Load Image

Load Image

Load Image

Load Image

Preprocessing

Score

Merge image and annotations

N

Write

Page 19: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

@currydef process_batch(client, style_model, preprocessing, output_path, batch):

remote_batch_f = client.scatter(batch)img_array_f = client.map(load_image, remote_batch_f)pre_img_array_f = client.map(preprocessing, img_array_f)bbox_list_f = client.submit(score_batch, style_model, pre_img_array_f)results_f = client.submit(loop_annotations, img_array_f, bbox_list_f)return client.submit(loop_write(output_path), batch, results_f)

Page 20: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Dask on Kubernetes

• Use Helm chart to deploy, workers, scheduler and Jupyter lab

• Provision 3 VMs with 4 V100s each

• Installed Nvidia device plugin

• Installed plugin for storage

Page 21: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Demo Kubernetes

Page 22: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Summary

Able to easily prototype two batch scoring scenarios locally then deploy on Kubernetes as well as Azure pipelines

Using GPUs through Dask could be more straight forward, better interaction between Dask and DL libraries

Page 23: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

AcknowledgementsJS Tan

Azure Machine Learning Pipelines Team

Page 24: Dask and V100s for Fast, Distributed Batch Scoring of ... · •Installed plugin for storage. Demo Kubernetes. Summary Able to easily prototype two batch scoring scenarios locally

Thanks &

Questions?