deep learning compiler - sampl: home€¦ · © 2018, amazon web services, inc. or its affiliates....

25
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidentia AWS AI Deep Learning Compiler

Upload: others

Post on 28-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon/Intel Confidential

AWS AI

Deep Learning Compiler

Page 2: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Acknowledgement

Page 3: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Amazon Sagemaker Neo

Enables developers to train machine learning models once and run them anywhere in the cloud and at the edge

Product targets• Amazon Rekognition• AWS DeepLens• Amazon Lex• …• And a lot of internal/external

products

Hardware targets• Intel CPU, Intel graphics• ARM CPU, ARM GPU• Nvidia GPU• FPGA• ASIC• …

Page 4: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Page 5: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

CONV Kernel tuning

Page 6: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Intel Xeon Platinum 8000-series CPUs (Skylake)

• Multi-cores• E.g., EC2 c5.9xlarge: 1 processor with 18 cores.

• AVX-512 supported• 512-bit width registers (ZMM)• E.g. vfmadd231ps -1664(%rax,%r13){1to16},

%zmm0, %zmm1

Page 7: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

CONV optimizationData layout is important!conv = tvm.compute(oshape, lambda n, oc, oh, ow: tvm.sum( data[n, ic, oh*stride+kh, ow*stride+kw] * kernel[oc, ic, kh, kw], axis=[ic, kh, kw]),) for (n, 0, N):

for (oc, 0, OC): for (oh, 0, OH): for (ow, 0, OW): Out[n, oc, oh, ow] = 0 // init Out for (ic, 0, IC): for (kh, 0, KH): for (kw, 0, KW): // Out += In * Kernel

• NCHW -> NHWC • NCHW -> NCHW[x]c

• OIHW-> OIHW[x]i[y]o

in_height

in_width

kernel_width

kernel_height

out_width

out_heightout_channel(# of kernel)

in_channel

Page 8: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

CONV optimizationUtilize the AVX-512 ISA well

(broadcast) Load input to DRAM; Load kernels to ZMM; // up to 16 float32 vfmadd input, kernel, output Store output back to DRAM

in_height

in_width

kernel_width

kernel_height

out_width

out_heightout_channel(# of kernel)

in_channel

ow_inner

inputs kernels

ZMM_0

ZMM_1 - ZMM_{ow_inner}

+ ×

DRAM

outputs

vectorized FMA

Load 31 inputs to DRAM; Load kernels to ZMM; vfmadd input_1, kernel, output_1 vfmadd input_2, kernel, output_2… vfmadd input_31, kernel, output_31 Store output_{1…31} back to DRAM

Page 9: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Intel Graphics on Amazon DeepLens

Hardware Configs: Intel HD Graphics 500 (Intel’s Gen 9)• On-die integrated GPU• 12 EUs, 0.55 GHz• 7 physical threads per EU, 2 128-bit FPUs per EU• 105.6 GFLOPS peak performance• Work items in the same SIMD group form a subgroup sharing 4KB GRFs

• Intel Opencl ext: cl_intel_subgroups

• Shares the main memory with CPU

Page 10: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Instruction examples and corresponding TVM instructions• intel_sub_group_block_read/write ⇒ cache_read/write(buffer,

“warp”, [result])

• Intel_sub_group_shuffle ⇒ storage_align(axis, 16) and bind it to threads

Convolution:• Work items work on a certain block of workloads to utilize local memory• Layout transform for coalescing memory accesses• Utilize cl_intel_subgroups operations

Page 11: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Graph-level optimization

Page 12: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Graph-level layout optimization

Data

CONV

BATCH_NORM

RELU POOLING

CONV

FLATTEN

NCHW

NCHW

NCHW

Undef

NCHW

KernelOIHW

Mean / VarianceC

Kernel

Data

CONV_NCHW16c

BATCH_NORM

RELU POOLING

CONV_NCHW16c

FLATTEN

NCHW16c

NCHW16c

NCHW16c

NCHW16c

Kernel

Mean / Variance

Kernel

LayoutTransformNCHW

NCHW16c

LayoutTransformOIHW16i16o OIHW

LayoutTransform CC16c

OIHW LayoutTransform

LayoutTransform

NCHW16c

NCHWUndef

OIHW16i16oOIHW

optimizedlayout

LayoutTransform forparameters can be

pre-computed duringcompile time.

AlterOpLayout

NCHW

NCHW

Page 13: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Graph/tensor co-optimization

CONVi LayoutTransform CONVj

LayoutTransform CONVk LayoutTransform

CONVl

ELEWISE_ADD

LayoutTransform

CONV

LayoutTransform ?1 2 3

N-2 N-1 N

Yes

NoCONV schemes

CONV computing time: varies alongwith different CONV schemes

Layout Transform time: variesalong with different CONV schemes

Dynamic programming + necessary heuristics

Page 14: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

End-to-end results Batch size = 1

Intel CPU Intel Graphics

0

0.5

1

1.5

2

2.5

ResNet-18

ResNet-34

ResNet-50

ResNet-101

ResNet-152

VGG-11

VGG-13

VGG-16

VGG-19

DenseNet-121

DenseNet-161

DenseNet-169

DenseNet-201

Inception-v3

MobileNetSSD

MXNet OpenVINO TVM

00.20.40.60.81

1.2

ResNet-18

ResNet-34

ResNet-50

ResNet-101

ResNet-152 SSD

OpenVINO TVM

Page 15: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Other functionalities

Page 16: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Runtime multi-threadingUse a customized thread pool for CPU targets• Lock-free queue using C++ atomics• Thread-binding to physical cores• Cache line padding

Page 17: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

ResNet-152 VGG-19

DenseNet-121 Inception-v3

Page 18: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Graph Annotation

GPU

CPU

GPU

GPU

Annotation Copy node insertion Optimization/Compilation Runtime

GPU node CPU node Data copy node

GPU

CPU

GPU

GPU

TVM op

TVM op

TVM op

TVM op

GPU

CPU

GPU

GPU

CPU lib

GPU lib

CPU lib

GPU lib

graph lib file

Page 19: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Quantization on Intel CPUs

Hardware support : Fast INT8 operations with INT32 accumulationINT8 conv2d kernel requires new schedule

• Performs reduction in groups of 4 INT8 elements to INT32 elements• FP32 schedule does not require in-vector reduction

0

0.5

1

1.5

2

2.5

3

3.5

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25

Spee

dup

norm

alize

d to

FP32

sche

dule

s

Workloads

INT8 schedules speedup for varying workloads of conv2d

Page 20: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

ASICs – AWS inferentia

Page 21: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Takeaways

Page 22: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Takeaways• Industry needs an open standard compiler for DL

• AWS working on the TVM stack

Page 23: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Takeaways• Industry needs an open standard compiler for DL

• AWS working on the TVM stack

• We are eager to collaborate with the community• Talk to us, we have 10+ people here today!

Page 24: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Takeaways• Industry needs an open standard compiler for DL

• AWS working on the TVM stack

• We are eager to collaborate with the community• Talk to us, we have 10+ people here today!

Page 25: Deep Learning Compiler - SAMPL: Home€¦ · © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Sagemaker Neo Enables developers to train machine learning

© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Takeaways• Industry needs an open standard compiler for DL

• AWS working on the TVM stack

• We are eager to collaborate with the community• Talk to us, we have 10+ people here today!

• We are hiring!• Write to Vin Sharma ([email protected]) or Yida Wang

([email protected])