adapting to a cambrian ai/sw/hw explosion with open co-design competitions and collective knowledge

24
Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack or how to adapt to a Cambrian explosion in or how to adapt to a Cambrian explosion in AI / SW / HW AI / SW / HW ARM Research Summit ARM Research Summit Cambridge, September 2017 Cambridge, September 2017 Grigori Fursin Grigori Fursin CTO and co CTO and co-founder, dividiti, UK founder, dividiti, UK Chief Scientist, cTuning foundation Chief Scientist, cTuning foundation … with cKnowledge.org and open co … with cKnowledge.org and open co-design competitions design competitions

Upload: grigori-fursin

Post on 24-Jan-2018

446 views

Category:

Data & Analytics


2 download

TRANSCRIPT

Community-Driven and Knowledge-Guided Optimization of AI Applications Across the Whole SW/HW Stack

or how to adapt to a Cambrian explosion inor how to adapt to a Cambrian explosion in AI / SW / HWAI / SW / HW ……

ARM Research SummitARM Research Summit Cambridge, September 2017Cambridge, September 2017

Grigori FursinGrigori Fursin CTO and coCTO and co--founder, dividiti, UKfounder, dividiti, UK

Chief Scientist, cTuning foundationChief Scientist, cTuning foundation

… with cKnowledge.org and open co… with cKnowledge.org and open co--design competitionsdesign competitions

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((2 of 24)of 24)

A race to develop innovative AI products and systems (SW & HW) …A race to develop innovative AI products and systems (SW & HW) …

Various form factors: IoT, mobile, data centers, supercomputers

Various constraints: speed, energy, accuracy, size, resiliency, costs

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((3 of 24)of 24)

… leads to a Cambrian AI/SW/HW explosion and technological chaos… leads to a Cambrian AI/SW/HW explosion and technological chaos

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((4 of 24)of 24)

Which AI/SW/HW solutions will survive?Which AI/SW/HW solutions will survive?

AI users

We at dividiti.com perform competitive analysis

and optimization of the whole AI/SW/HW stack for various realistic scenarios

(object detection, image classification, etc)

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((5 of 24)of 24)

Scenario: image classification on mobile devices

800+ distinct mobile devices mobile CPUs and GPUs Caffe, TensorFlow OpenBLAS, CLBlast, ViennaCL, Eigen AlexNet, GoogleNet, SqueezeNet ImageNet and user images

Requirement: speed vs cost (vs energy vs accuracy vs model size vs memory usage vs reliability…)

Price (euros)

Exe

cuti

on

tim

e (

sec)

Just a few winning "AI+SW+HW species"

must be optimized further or may "extinct"

Obtained using our CK-based Android app to crowdsource experiments across devices provided by volunteers (later in the talk)

cKnowledge.org/repo cKnowledge.org/ai

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((6 of 24)of 24)

Optimization is adOptimization is ad--hoc, tedious, expensive and time consuming hoc, tedious, expensive and time consuming

Mobile device Server Mobile device Server

Data centersData centers

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs

Existing frameworks / algorithms Existing frameworks / algorithms

Various models Various models

User front-end (cloud, GRID, User front-end (cloud, GRID, supercomputer, etc)

Algorithm / source code Algorithm / source code

Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…

100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK 100s of models for TensorFlow,Caffe,Torch,Theano,MxNet,CNTK

CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …

C,C++,Fortran,Java,Python,byte code, assembler …

LLVM,GCC,ICC,Rose,PGI,Lift ,functional programming …

cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS,

clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons

diverse hardware: heterogeneous, out-of-order, caches

(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)

Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS …

Too many design and optimization choices at each level of continuously changing SW/HW stack!

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((7 of 24)of 24)

Mobile device Server Mobile device Server

Data centersData centers

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs

Existing frameworks / algorithms Existing frameworks / algorithms

Various models Various models

User front-end (cloud, GRID, User front-end (cloud, GRID, supercomputer, etc)

Algorithm / source code Algorithm / source code

Microsoft Azure, AWS, Google Cloud, XSEDE, PRACE, Watson…

Hundreds of models for TF, Caffe, Torch, Theano, MxNet, CNTK

CUDA, MPI, OpenMP, TBB, OpenCL, StarPU, OmpSs …

C,C++,Fortran,Java,Python,byte code, assembler …

LLVM,GCC,ICC,Rose,PGI,Lift , functional programming …

cuBLAS, BLAS,MAGMA,ViennaCL,CLBlast,cuDNN, openBLAS, clBLAS, libDNN, tinyDNN,ARM compute lib, libxsmm, skeletons

diverse hardware: heterogeneous, out-of-order, caches

(ARM,x86,CUDA,Mali,Adreno,Power,TPU,FPGA,MIPS,AVX,neon)

Linux (CentOS, Ubuntu, RedHat, SUSE, Debian), Android, Windows, BSD, iOS, MacOS …

Time to reinvent computer engineering

and enable open, collaborative and reproducible AI/SW/HW co-design!

Time to reinvent computer engineering

and enable open, collaborative and reproducible AI/SW/HW co-design!

Optimization is adOptimization is ad--hoc, tedious, expensive and time consuming hoc, tedious, expensive and time consuming

Too many design and optimization choices at each level of continuosly changing SW/HW stack!

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((8 of 24)of 24)

cKnowledge.org: cKnowledge.org: pluginplugin--based workflow framework to cobased workflow framework to co--design AI/SW/HW stackdesign AI/SW/HW stack

Grigori Fursin, Anton Lokhmotov, Ed Plowman, "Collective Knowledge: towards R&D sustainability", DATE'16

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs Various models Various models

Algorithm / source code Algorithm / source code

AI framework AI framework

Common JSON API Common JSON API

Initial funding (2015)

Common experimental framework for computer engineering and AI research

https://github.com/ctuning/ck

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((9 of 24)of 24)

Repositories with reusable and customizable artifacts (JSON API and meta info)Repositories with reusable and customizable artifacts (JSON API and meta info)

Unified models Unified models

CK JSON API CK JSON API

CK meta CK meta MobileNets

GoogleNet GoogleNet

AlexNet

SqueezeNet SqueezeNet

ResNet ResNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta

AI frameworks AI frameworks

CK JSON API CK JSON API

CK meta CK meta TensorFlow

Caffe

Caffe2

CNTK

MxNet MxNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta … …

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs Various models Various models

Algorithm / source code Algorithm / source code

AI framework AI framework

Common JSON API Common JSON API

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((10 of 24)of 24)

Unified models Unified models

CK JSON API CK JSON API

AI frameworks AI frameworks

CK JSON API CK JSON API

… …

CK API CK API

Image classification

Image classification

CK API CK API

Object detection

Object detection

CK API CK API

Emotion Emotion analysis

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs Various models Various models

Algorithm / source code Algorithm / source code

AI framework AI framework

Common JSON API Common JSON API

Repositories with reusable and customizable workflows (JSON API)Repositories with reusable and customizable workflows (JSON API)

CK meta CK meta MobileNets

GoogleNet GoogleNet

AlexNet

SqueezeNet SqueezeNet

ResNet ResNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta TensorFlow

Caffe

Caffe2

CNTK

MxNet MxNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((11 of 24)of 24)

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs Various models Various models

Algorithm / source code Algorithm / source code

AI framework AI framework

Common JSON API Common JSON API

Unified models Unified models

CK JSON API CK JSON API

AI frameworks AI frameworks

CK JSON API CK JSON API

… …

CK API CK API

Image classification

Image classification

CK API CK API

Object detection

Object detection

CK API CK API

Emotion Emotion analysis

Crowdsource AI expeirments

across diverse platforms provided by volunteers

ContinuousContinuous competition of competition of various AI/SW/HW combinations various AI/SW/HW combinations ((species)species)

cKnowledge.org/repo

Everyone is on the same page: fair and reproducible competitions

CK meta CK meta MobileNets

GoogleNet GoogleNet

AlexNet

SqueezeNet SqueezeNet

ResNet ResNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta TensorFlow

Caffe

Caffe2

CNTK

MxNet MxNet

CK meta CK meta

CK meta CK meta

CK meta CK meta

CK meta CK meta

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((12 of 24)of 24)

CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components

setup setup soft soft

find find

extract features extract features dataset dataset

compile compile

run run

add add

replay replay experiment experiment

autotune autotune

program program

TensorFlow TensorFlow

Caffe2 Caffe2

ARM compute lib ARM compute lib

image classification image classification

object detection object detection

ImageNet ImageNet

Car video stream Car video stream

Real surveillance camera Real surveillance camera

GEMM OpenCL GEMM OpenCL

convolution CPU convolution CPU

performance results performance results

training / accuracy training / accuracy

bugs bugs

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

with some desc. with some desc.

Ad-hoc scripts to perform some actions on some artifacts

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((13 of 24)of 24)

CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components

setup soft

find

extract features extract features dataset

compile

run

add

replay experiment

autotune

program

TensorFlow TensorFlow

Caffe2 Caffe2

ARM compute lib ARM compute lib

image classification image classification

object detection object detection

ImageNet ImageNet

Car video stream Car video stream

Real surveillance camera Real surveillance camera

GEMM OpenCL GEMM OpenCL

convolution CPU convolution CPU

performance results performance results

training / accuracy training / accuracy

bugs bugs

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info

Python module Python module JSON API JSON API holder for original artifact holder for original artifact CK meta CK meta

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((14 of 24)of 24)

CK concepts: convert your artifacts into reusable and customizable componentsCK concepts: convert your artifacts into reusable and customizable components

setup soft

find

extract features extract features dataset

compile

run

add

replay experiment

autotune

program

TensorFlow TensorFlow

Caffe2 Caffe2

ARM compute lib ARM compute lib

image classification image classification

object detection object detection

ImageNet ImageNet

Car video stream Car video stream

Real surveillance camera Real surveillance camera

GEMM OpenCL GEMM OpenCL

convolution CPU convolution CPU

performance results performance results

training / accuracy training / accuracy

bugs bugs

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

JSON file JSON file

/ 1st level directory – CK modules / 2nd level dir - CK entries / CK meta info

Python module Python module JSON API JSON API holder for original artifact holder for original artifact CK meta CK meta

Collective Knowledge (github.com/ctuning/ck) –

$ $ ck pull $ ck add $ ck compile $ ck run

Collective Knowledge (github.com/ctuning/ck) – assists you in unifying, executing, sharing and reusing your artifacts:

$ sudo pip install ck $ ck pull repo:ck-autotuning $ ck add dataset:my-new-dataset (UID will be automatically generated) $ ck compile program:cbench-automotive-susan $ ck run program:cbench-automotive-susan

https://github.com/ctuning/ck/wiki/Shared-modules

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((15 of 24)of 24)

We already converted multiple AI frameworks, artifacts and workflows to the CKWe already converted multiple AI frameworks, artifacts and workflows to the CK

ICC 17.0

CUDA 8.0CUDA 8.0

GCC 7.0

LLVM 4.0

Databases, local repositoriesDatabases, local repositories

Ad

-ho

c in

it

Ad

-ho

c in

it

scri

pts

Ad-hoc scripts to

process CSV, XLS, TXT, etc.

Ad-hoc experimental workflows

Pro

gram

Pro

gram

C

K p

rog

ram

CK

pip

elin

e

CK compiler

CK AI framework

CK math library CK experiment

Caffe OpenCL

Caffe CUDACaffe CUDA

TensorFlow TensorFlow CPU/CUDA

MAGMA

cuBLAS

OpenBLASOpenBLAS

ViennaCL

CLBlast Stat. analysis, predictive analytics,

visualization

• github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow

$ ck pull repo –url= github.com/dividiti/ck-caffe

$ ck compile program:caffe-classification

$ ck run program:caffe-classification

https://github.com/ctuning/ck/wiki/Shared-repos

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((16 of 24)of 24)

We've already converted multiple AI frameworks, artifacts and workflows to the CKWe've already converted multiple AI frameworks, artifacts and workflows to the CK

ICC 17.0

CUDA 8.0CUDA 8.0

GCC 7.0

LLVM 4.0

Databases, local repositoriesDatabases, local repositories

Ad

-ho

c in

it

Ad

-ho

c in

it

scri

pts

Ad-hoc scripts to

process CSV, XLS, TXT, etc.

Un

ifie

d A

PI (

inp

ut)

U

nif

ied

AP

I (in

pu

t) Read

program Read

program meta

Detect all software Detect all software dependencies; ask user

If multiple versions exists

Prepare environment

Compile Compile program

Run program

Un

ifie

d A

PI (

ou

tpu

t)

Un

ifie

d A

PI (

ou

tpu

t)

Ad-hoc experimental workflows

Pro

gram

Pro

gram

C

K p

rog

ram

CK

pip

elin

e

CK compiler

CK AI framework

CK math library CK experiment

JSON JSON

CK program module can automatically adapt

to underlying environment via dependencies

Source files and auxiliary scripts Source files and auxiliary scripts

CK program entry (native directory) CK program entry (native directory)

.cm/meta.json – describes soft dependencies ,

data sets, and how to compile and run this program

.cm/meta.json – describes soft dependencies ,

data sets, and how to compile and run this program

CK entries associated with a given module describe a given object

using meta.json while storing all necessary files and sub-directories

Caffe OpenCL

Caffe CUDACaffe CUDA

TensorFlow TensorFlow CPU/CUDA

MAGMA

cuBLAS

OpenBLASOpenBLAS

ViennaCL

CLBlast Stat. analysis, predictive analytics,

visualization

• github.com/dividiti/ck-caffe • github.com/ctuning/ck-caffe2 • github.com/ctuning/ck-tensorflow

$ ck pull repo –url= github.com/dividiti/ck-caffe

$ ck compile program:caffe-classification

$ ck run program:caffe-classification

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((17 of 24)of 24)

Automatically adapting workflow to any underlying software and hardware

local / env / 03ca0be16962f471 / env.sh

Tags: compiler,cuda,v8.0

local / env / 03ca0be16962f471 / env.sh

Tags: compiler,cuda,v8.0

local / env / 0a5ba198d48e3af3 / env.bat

Tags: lib,blas,cublas,v8.0

local / env / 0a5ba198d48e3af3 / env.bat

Tags: lib,blas,cublas,v8.0

Soft entries in CK describe how to detect if a given software is

already installed, how to set up all its environment including

all paths (to binaries, libraries, include, aux tools, etc),

and how to detect its version

$ ck detect soft --tags=compiler,cuda $ ck detect soft --tags=compiler,cuda

$ ck detect soft:compiler.gcc $ ck detect soft:compiler.gcc

$ ck detect soft:compiler.llvm $ ck detect soft:compiler.llvm

$ ck list soft:compiler* $ ck list soft:compiler*

$ ck detect soft:lib.cublas $ ck detect soft:lib.cublas

Env entries are created in CK local repo for all found software

instances together with their meta and an auto-generated environment

script env.sh (on Linux) or env.bat (on Windows)

Package entries describe how to install a given software if it is not already installed (using CK Python

plugin together with install.sh script on Linux host or install.bat

on Windows host)

$ ck install package:caffemodel-bvlc-googlenet $ ck install package:caffemodel-bvlc-googlenet

$ ck install package:imagenet-2012-val $ ck install package:imagenet-2012-val

$ ck install package:lib-tensorflow-cuda $ ck install package:lib-tensorflow-cuda

$ ck list package:*caffemodel* $ ck list package:*caffemodel*

Lo

cal C

K r

epo

L

oca

l CK

rep

o

$ ck search soft --tags=blas $ ck search soft --tags=blas

$ ck show env $ ck show env

$ ck show env –tags=cublas $ ck show env –tags=cublas

$ ck rm env:* –tags=cublas $ ck rm env:* –tags=cublas

$ ck search package –tags=caffe $ ck search package –tags=caffe

$ ck list package:*tensorflow* $ ck list package:*tensorflow* $ ck install package:lib-caffe-bvlc-master-cuda-universal $ ck install package:lib-caffe-bvlc-master-cuda-universal

https://github.com/ctuning/ck/wiki/Portable-workflows

Multiple versions of tools may easily co-exist and plugged in to CK workflows!

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((18 of 24)of 24)

Applying methodology from natural sciences to optimize computer systems

https://github.com/ctuning/ck/wiki/Autotuning

CK Python modules (wrappers) with a unified JSON API

CK

inp

ut

(JSO

N/d

ict)

C

K o

utp

ut

(JSO

N/d

ict)

Unified input

Behavior Behavior

Choices Choices

Features Features

State State

Action Action

Unified output

Behavior Behavior

Choices Choices

Features Features

State State

b = B( c , f , s ) … … … …

Formalized function B of a behavior of any CK object

Flattened CK JSON vectors (dict converted to vector)

to simplify statistical analysis, machine learning and data mining

Some

actions

Tools (compilers, profilers, etc)Tools (compilers, profilers, etc) Generated filesGenerated files

Chain CK modules to implement research workflows such as multi-objective autotuning and co-design

exploration Choose

exploration strategy

Perform SW/HW DSE Perform SW/HW DSE (math transforms, skeleton params,

compiler flags, transformations …)

Perform

Perform stat.

analysis

Detect (Pareto) frontier

Model

optimizations

Model behavior,

predict optimizations

Reduce

complexity

Set Set environment

for a given tool version

CK program module with pipeline function

Compile Compile program

Run code

i

i

i i

First expose coarse grain high-level choices, features, system state and behavior characteristics

Crowdsource benchmarking and random exploration across diverse inputs and devices;

Keep best species (AI/SW/HW choices); model behavior; predict better optimizations and designs

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((19 of 24)of 24)

Prepare first proof-of-concept community experiments

Available libraries / skeletons Available libraries / skeletons

Compilers Compilers

Binary or byte code Binary or byte code

Hardware, simulators Hardware, simulators

Run-time environment Run-time environment

Run-time state Run-time state of the system

Inputs Inputs Various models Various models

Algorithm / source code Algorithm / source code

AI framework AI framework

Algorithms: object classification, object detection

AI frameworks: Caffe CPU, Caffe OpenCL, TensorFlow CPU

Math libraries: OpenBLAS, ViennaCL, clBLAS, CLBlast, cuBLAS, cuDNN, Eigen, gemmlowp

Compilers: GCC 5+

Models: AlexNet, GoogleNet, VGG, ResNet, SqueezeNet, SqueezeDet, SSD

Datasets: KITTI, COCO, VOC, ImageNet

Optimization choices: batch size, number of CPU threads

Characteristics: total execution time (including OpenCL overheads), top1/top5 model accuracy, static model size (MB), device cost, max power consumption (if available)

System state: CPU/GPU frequency, memory

cKnowledge.org/repo

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((20 of 24)of 24)

Crowdsource benchmarking across Android devices provided by volunteers

Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo

The number of distinct participated platforms:800+

The number of distinct CPUs: 260+

The number of distinct GPUs: 110+

The number of distinct OS: 280+

Power range: 1-10W

No need for a dedicated and expensive cloud –

volunteers help us validate research ideas

similar to SETI@HOME

Also collecting real images from users for misclassifications to build an open

and continuously updated training set)!

Winning solutions on various frontiers

Tim

e p

er

imag

e (

seco

nd

s)

Cost(euros)

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((21 of 24)of 24)

Crowdsource benchmarking across Android devices provided by volunteers

Continuously collect statistics, bugs and misclassifications at cKnowledge.org/repo

Winning solutions on various frontiers

Firefly-RK3399

The number of distinct participated platforms:790+

The number of distinct CPUs: 260+

The number of distinct GPUs: 110+

The number of distinct OS: 280+

Power range: 1-10W

No need for a dedicated and expensive cloud –

volunteers help us validate research ideas

similar to SETI@HOME

Also collecting real images from users for misclassifications to build an open

and continuously updated training set)!

Tim

e p

er

imag

e (

seco

nd

s)

Cost(euros)

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((22 of 24)of 24)

Let's dig further – (crowdsource) BLAS autotuning in Caffe on Firefly-RK3399

Collaboration between Marco Cianfriglia (Roma Tre University), Cedric Nugteren (TomTom), Flavio Vella, Anton Lokhmotov and Grigori Fursin (dividiti)

Name Description Ranges

KWG 2D tiling at workgroup level {32,64}

KWI KWG kernel-loop can be unrolled by a factor KWI {1}

MDIMA Local Memory Re-shape {4,8}

MDIMC Local Memory Re-shape {8, 16, 32}

MWG 2D tiling at workgroup level {32, 64, 128}

NDIMB Local Memory Re-shape {8, 16, 32}

NDIMC Local Memory Re-shape {8, 16, 32}

NWG 2D tiling at workgroup level {16, 32}

SA manual caching using the local memory {0, 1}

SB manual caching using the local memory {0, 1}

STRM Striding within single thread for matrix A and C {0,1}

STRN Striding within single thread for matrix B {0,1}

VWM Vector width for loading A and C {8,16}

VWN Vector width for loading B {0,1}

Tunable parameters of OpenCL-based BLAS ( github.com/CNugteren/CLBlast )

For now only two data sets (small & large)

Some extra constraints to avoid illegal combinations

Use different autotuners under CK to speed up

design space exploration based on probabilistic

focused search, generic algorithms,

deep learning, SVM, KNN, MARS, decision trees …

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((23 of 24)of 24)

Let's dig further – autotuning BLAS (CLBlast) in Caffe on Firefly-RK3399

• Caffe with autotuned OpenBLAS (threads and batches) is the fastest • Caffe with autotuned CLBlast is 6..7x faster than default version and competitive with

OpenBLAS-based version– now worth making adaptive selection at run-time.

Sharing results in a reproducible way with the community for validation and improvement: https://nbviewer.jupyter.org/github/dividiti/ck-caffe-firefly-rk3399/ blob/master/script/batch_size-libs-models/analysis.20170531.ipynb

cKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open cocKnowledge.org : helping industry and academia adapt to a Cambrian AI/SW/HW explosion via open co--design competitions design competitions ((24 of 24)of 24)

• Bring together industry and academia to participate in open and reproducible AI/SW/HW co-design competitions using CK framework • Share more artifacts, workflows and results in a reusable and customizable CK format (common JSON API and meta description) • Collaboratively improve models and find missing features • Gradually expose more design and optimization knobs at all AI/SW/HW levels • Enable distributed on-line learning for self-optimizing and self-learning systems

http://cKnowledge.org/partners http://cKnowledge.org/publications

Join the growing Collective Knowledge community!