using ml for ml to span the gamut of · using ml for ml to span the gamut of tinyml hardware tinyml...

90

Upload: others

Post on 26-Mar-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML
Page 2: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Using ML for ML to span the gamut of TinyML hardware

TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Page 3: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Motivation

CNN GAN RNN MLP DQNNCambrian explosion of models, workloads, and use cases.

Rapidly evolving ML software ecosystem

Silicon scaling limitations (Dennard and Moore):

Cambrian explosion of HW backends. Heterogeneous HW.

Growing set of requirements: Cost Latency Power Security

2

Page 4: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Motivation

CNN GAN RNN MLP DQNNCambrian explosion of models, workloads, and use cases.

Rapidly evolving ML software ecosystem

Silicon scaling limitations (Dennard and Moore):

Cambrian explosion of HW backends. Heterogeneous HW.

Growing set of requirements: Cost Latency Power Security

3

Hard enough on server class hardware…

But now we want to do it in on bare-metal devices?

Page 5: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Bare-Metal Devices

4

FPGA ARM M class,

RISC-V MCUs, etc

Page 6: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Programming bare-metal devices is cumbersome

5

ML Model

?

Page 7: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Running ML on bare-metal hardware is hard

6

ML Model

Page 8: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Running ML on bare-metal hardware is hard

7

ML Model

ML from scratch in C…

Page 9: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML FrameworkML Model

Running ML on bare-metal hardware is pretty hard

8

Page 10: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML FrameworkML Model

Running ML on bare-metal hardware is pretty hard

9

Slow kernels,Incomplete coverage

Page 11: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML FrameworkML Model

Hand- optimize kernels

Running ML on bare-metal hardware is really hard

10

Page 12: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML FrameworkML Model

Hand- optimize kernels

Running ML on bare-metal hardware is really hard

11

Not much help here…

Page 13: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML FrameworkML Model

Hand- optimize kernels

Running ML on bare-metal hardware is really hard

12

Bugs!

Page 14: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML Framework

Run ML models

Debug kernels

Hand- optimize kernels

Running ML on bare-metal hardware is really hard

13

Page 15: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ML Framework

Run ML models

Debug kernels

Hand- optimize kernels

Optimizing ML on bare-metal hardware is a long road

14

Page 16: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

μTVM simplifies ML software on bare-metal

15

ML Framework

Run ML models

Debug kernels

Hand- optimize kernels

Automatically

Page 17: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Our Goal

16

Minimize the effort to run and optimize ML on bare-metal.

Page 18: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Our Goal

17

Minimize the effort to run and optimize ML on bare-metal.

So what’s our approach?

Page 19: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

18

Page 20: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

19

FPGA ASIC

Page 21: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

20

FPGA ASIC

How do we get from here…

to here?

Page 22: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

TVM

The TVM in µTVM

21

FPGA ASIC

Page 23: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

???

The TVM in µTVM

22

FPGA ASIC

Page 24: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

???

The TVM in µTVM

23

High-Level Differentiable IR

FPGA ASIC

Page 25: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

???

The TVM in µTVM

24

High-Level Differentiable IR

Tensor Expression IR

FPGA ASIC

Page 26: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

25

FPGA ASIC

High-Level Differentiable IR

Tensor Expression IR

VTALLVM, CUDA

Page 27: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

26

FPGA ASIC

High-Level Differentiable IR

Tensor Expression IR

VTALLVM, CUDA

Hand-optimization?

Page 28: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

27

FPGA ASIC

High-Level Differentiable IR

Tensor Expression IR

VTALLVM, CUDA

AutoTVM

Page 29: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Where TVM is being used today

28

Every “Alexa” wake-up today across all devices uses a model optimized with TVM

“[TVM enabled] real-time on mobile CPUs for free...We are excited about the performance TVM achieves.” More than 85x speed-up for speech recognition model.

Bing query understanding: 112ms (Tensorflow) -> 34ms (TVM). QnA bot: 73ms->28ms (CPU), 10.1ms->5.5ms (GPU)

“TVM is key to ML Access on Hexagon”

Page 30: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Where TVM is being used today

29

Every “Alexa” wake-up today across all devices uses a model optimized with TVM

“[TVM enabled] real-time on mobile CPUs for free...We are excited about the performance TVM achieves.” More than 85x speed-up for speech recognition model.

Bing query understanding: 112ms (Tensorflow) -> 34ms (TVM). QnA bot: 73ms->28ms (CPU), 10.1ms->5.5ms (GPU)

“TVM is key to ML Access on Hexagon”

Cf: TVM Conference 2019 (Dec) - videos and slides available online

Page 31: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The TVM in µTVM

30

FPGA ASIC

High-Level Differentiable IR

Tensor Expression IR

VTALLVM, CUDA

AutoTVM

Many hardware targets already enjoy speedups from TVM

Page 32: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Motivation

31

Except for microcontrollers...

Page 33: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAµTVM

Today’s Talk

32

Page 34: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

33

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAµTVM

1

Page 35: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

34

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAµTVM

1

2

Page 36: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

35

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAµTVM

1

2

3

Page 37: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

36

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAµTVM

1

2

3

Page 38: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

37

TVM + Program

Page 39: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

38

TVM + Programcompile program

Hardware Backend

Page 40: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

39

TVM + Program

runandmeasure

compile programHardware Backend

Page 41: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

40

AutoTVM

runandmeasure

compile program

return timing feedback

Hardware BackendTVM + Program

Page 42: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

41

AutoTVM

runandmeasure

compile program

improveimplementation

Hardware Backend

return timing feedback

TVM + Program

Page 43: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

42

runandmeasure

compile program

improveimplementation

Hardware Backend

return timing feedback

TVM + Program

AutoTVM

Page 44: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

43

runandmeasure

compile program

improveimplementation

Hardware Backend

return timing feedback

TVM + Program

AutoTVM

Page 45: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

44

runandmeasure

compile program

improveimplementation

Hardware Backend

return timing feedback

TVM + Program

AutoTVM

Page 46: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

45

runandmeasure

compile program

improveimplementation

Hardware Backend

return timing feedback

TVM + Program

AutoTVM

Page 47: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM

Automatically adapt to hardware type by learning

46

Page 48: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Today’s Talk

47

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDA

1

2

3

Page 49: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

48

µTVM

Using µTVM µTVM Runtime

Graph Runtime

Page 50: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

49

µTVM

Using µTVM µTVM Runtime

Graph Runtime

Page 51: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

How easy is it to use μTVM?

50

Page 52: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

How easy is it to use μTVM?

51

• Implement an interface to: - Read and write to device memory - Start code execution

Page 53: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

How easy is it to use μTVM?

52

• Implement an interface to: - Read and write to device memory - Start code execution

• Provide a cross-compiler for the device

Page 54: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

How easy is it to use μTVM?

53

+

• Implement an interface to: - Read and write to device memory - Start code execution

• Provide a cross-compiler for the device

Page 55: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ResNet Example in Mainline TVM

54

Page 56: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ResNet Example in Mainline TVM

55

image =

Page 57: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ResNet Example in Mainline TVM

56

resnet, params = get_resnet() module = graph_runtime_build(resnet, params)

image =

Page 58: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

ResNet Example in Mainline TVM

57

resnet, params = get_resnet() module = graph_runtime_build(resnet, params) module.run(image)

image =

> 'cat'

Page 59: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

TVM → µTVM

58

resnet, params = get_resnet()

module = graph_runtime_build(resnet, params)

image =

resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)

image =

Page 60: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

TVM → µTVM

59

resnet, params = get_resnet()

module = graph_runtime_build(resnet, params)

image =

resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)

image =wrap in a session

Page 61: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

TVM → µTVM

60

resnet, params = get_resnet()

module = graph_runtime_build(resnet, params)

image =

resnet, params = get_resnet() with micro.Session('openocd') as sess: module = sess.build(resnet, params)

image = different build function

wrap in a session

Page 62: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

61

µTVM

Using µTVM µTVM Runtime

Graph Runtime

Page 63: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

62

µTVM

Using µTVM µTVM Runtime

Graph Runtime

Page 64: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Today’s Talk

63

C CodeGenerator

μDeviceInterface

µTVMRuntime

Page 65: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Code Generation

64

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDA

Page 66: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Code Generation

65

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDA

why not LLVM?

Page 67: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Code Generation

66

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDA

lack of support…

Page 68: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Code Generation

67

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDAC

Page 69: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µDevice Interface

68

Read Write Execute

Page 70: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µDevice Interface

69

Read Write Execute

Sockets? JTAG?Doesn't matter.

Page 71: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The Runtime

70

μTVM Runtime

C CodeGenerator

μDeviceInterface

Page 72: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

The Runtime in Action

71

μTVM Runtime

C CodeGenerator

μDeviceInterface

Page 73: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Set up a RWX interface with the board

72

communication interface

μTVM Runtime

C CodeGenerator

μDeviceInterface

Page 74: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Code generator lowers TVM IR to C code

73

func.c

IR→code

communication interface

μTVM Runtime

C CodeGenerator

μDeviceInterface

Page 75: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Device compiler cross compiles generated code

74

func.ogcc

communication interface

μTVM Runtime

C CodeGenerator

μDeviceInterface

func.c

IR→code

Page 76: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

μTVM remaps generated binary to custom device memory locations for multiple library loading

75

remapfunc

ld linker

communication interface

μTVM Runtime

C CodeGenerator

μDeviceInterface

func.c

IR→code

func.ogcc

Page 77: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Remapped binary is loaded onto device

76

customloader

communication interface

μTVM Runtime

C CodeGenerator

μDeviceInterface

func.c

IR→code

func.ogcc remap

func

ld linker

Page 78: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Function invocation on host transfers parameters to the device for execution

77

run

send parameters

'cat'

μTVM Runtime

C CodeGenerator

μDeviceInterface

func.c

IR→code

func.ogcc remap

func

ld linker

customloader

Page 79: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

µTVM

Today’s Talk

78

High-Level Differentiable IR

Tensor Expression IR AutoTVM

VTA

FPGA ASIC

LLVM, CUDA

1

2

3

Page 80: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

AutoTVM on µTVM

79

● Same pipeline as usual ● Load kernels into RAM

instead of flash

Page 81: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

End-to-End CIFAR-10 Evaluation

80

Conv2D

MaxPool2DConv2D AvgPool2D

Conv2DAvgPool2D

Dense

3@32x32

32@32x32 32@16x16 32@16x16 32@8x8

64@8x8 64@4x4

1x10

Replicated an int8-quantized CNN from an ARM Mbed tutorial

Page 82: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Preliminary CIFAR-10 CNN Results

81

● Ran on ARM Cortex-M7

● Compared against CMSIS-NN

● Vanilla template

● ~5 hours of tuning

● No vectorization

Page 83: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Preliminary Int-8 Conv2D Results

82

arm_convolve_HWC_q7_fast in CMSIS-NN arm_convolve_HWC_q7_RGB in CMSIS-NN

Page 84: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Coming Soon - Self-hosted models

83

Page 85: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Coming Soon - Ultra low bit-width quantization

84Riptide: Fast End-to-End Binarized Neural Networks. MLSys 2020 (March 3rd) Joshua Fromm · Meghan Cowan · Matthai Philipose · Luis Ceze · Shwetak Patel

Squeezenet on RaspberryPi 3

• TVM already supports flexible code generation for a variety of data types

• Natural to combine this with µTVM

• See Josh Fromm’s MLSys talk on March 3rd to learn more

Page 86: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Coming soon - µTVM as a service

• OctoML offering hosted version of AutoTVM for cloud, mobile, and embedded • Cloud CPU/GPU • ARM A class CPU/GPU • ARM M class microcontrollers • More on request

• Trained model in, optimized binary out • Currently in private beta

TensorFlow, Pytorch, ONNX serialized models

Optimized deployment artifacts

Octomizer

API and web UI

Auto-tuning using OctoML clusters

85

Page 87: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Status and next steps

86

• Functional on x86, ARM, and RISC-V • In-depth writeup early March on our blog with upstream TVM

documentation/tutorial • Follow us on https://octoml.ai or @octoml

• Interop with CMSIS-NN library and TF Lite Micro (better together!) • First class SIMD vectorization support for ARM schedules, (in addition

to current tensorization approach)

Page 88: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Questions or interest? Please reach out:

Jason Knight [email protected] Acknowledgments

Logan Weber - AutoTVM Joshua Fromm - RipTide

Thanks!

Page 89: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

88

Page 90: Using ML for ML to span the gamut of · Using ML for ML to span the gamut of TinyML hardware TinyML 2020 Jason Knight - Co-founder and CPO at OctoML

Copyright NoticeThe presentation(s) in this publication comprise the proceedings of tinyML® Summit 2020. The content reflects the opinion of the authors and their respective companies. This version of the presentation may differ from the version that was presented at the tinyML Summit. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.

There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.

tinyML is a registered trademark of the tinyML Foundation.

www.tinyML.org