mcdnn an approximation-based execution framework for deep...

31
MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints Haichen Shen ([email protected]) with Seungyeop Han, Matthai Philipose, Sharad Agarwal, Alec Wolman, Arvind Krishnamurthy University of Washington Microsoft Research 1

Upload: dodien

Post on 12-Aug-2019

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

MCDNN:AnApproximation-BasedExecutionFrameworkfor DeepStreamProcessingUnderResourceConstraints

HaichenShen([email protected])withSeungyeop Han,Matthai Philipose,

Sharad Agarwal,AlecWolman,ArvindKrishnamurthy

UniversityofWashington MicrosoftResearch

1

Page 2: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Wearable computing

2

???

à more data

Page 3: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

When computer vision meets wearable

Consumer Manufacturing Public Safety

“That drink will get you to 2800 calories for today”

“I last saw your keys in the store room”

“Remind Tom of the party”

“You’re on page 263 of this book”

3

Page 4: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Deep learning makes vision work

Recognition Task face scene* object*

Accuracy 97% 88% 92%

Compute/frame (FLOPs) 1.00G 30.9G 39.3G

Do we have enough resources to run deep learning?

4

Compute@1-30fps (FLOPS) 1-30G 30-900G 40G-1.2T

* top-5 accuracy is shown in the table

But...

Page 5: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Resource usage for continuous vision

Imager Processor Radio

OmnivisionOV274090mW

Tegra K1 GPU290GOPS@10W= 34pJ/OP

Qualcomm SD810 LTE>800mWAtheros 802.11 a/g15Mbps@700mW= 47nJ/b

Cloud

Amazon EC2CPU c4.large 2x400GFLOPS $0.1/hGPU g2.2xlarge 2.3TFLOPS $0.65/h

5Huge gap between workload and budget

BudgetDevice power Cloud cost

30% of 10Wh for 10h = 300mW $10 person/year

Workload Deep learning 300GFLOPS @ 30GFLOPs/frame, 10fps

Compute power 9GFLOPS 3.5GFLOPS (GPU) / 8GFLOPS (CPU)

Page 6: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Neural network

(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax

6

c

r

m

c

r

m

m

r

c

r

c

r

c

f

r

f

r

f

s

imgRimgGimgB

classes+scores

Page 7: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Neural network ≈ matrix multiplications

7

x x x

Low rank approximation(Y. Kim, et al. 2016)

x

Matrix sparsification(S. Han, et al. 2015)

(c)convolution(f)fullyconnect(m)maxpool(r)relu(s)softmax

c

r

m

c

r

m

m

r

c

r

c

r

c

f

r

f

r

f

s

imgRimgGimgB

classes+scores

c

c

r

m m

r

f

r

c

r

f

s

imgRimgGimgB

classes+scores

Architectural changes(J. Ba, et al. 2014)

Page 8: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Managing the approx. / resource trade-off

‣Detailed characterization of the approximation / resource trade-off for many optimizations

‣ Two new optimizations for streaming, multi-application settings

‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution

8

Page 9: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Outline

‣Detailed characterization of the approximation / resource trade-off for many optimizations

‣ Two new optimizations for streaming, multi-application settings

‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution

9

Page 10: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

30 40 50 60 70 80 90 100aFFuraFy (%)

101

102

103

mem

ory

(0

%)

2bM

6Fene

)aFe

Memory / accuracy trade-off

10

Page 11: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

30 40 50 60 70 80 90 100aFFuraFy (%)

101

102

103

mem

ory

(0

%)

2bM

6Fene

)aFe

Memory / accuracy trade-off

11

Substantially reduce memory use with gradual accuracy loss

Page 12: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

45 50 55 60 65 70 75 80 85aFFuraFy (%)

0

2

4

6

8

10

en

erJ

y (

J)

2Ej((xeF)

6Fene((xeF)

)aFe((xeF)

Energy / accuracy trade-off

Wifi xmit cost (0.5J)

LTE xmit cost (0.9J)

Compute energy budget (2.3J)

12

Always execute locally

Can execute locally under energy budget

Exceed energy budget when execute locally

energy budget = total energy / total time(10h) / requests per second(1 req/sec)

Nvidia Jetson TK1

Page 13: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Outline

‣Detailed characterization of the approximation / resource trade-off for many optimizations

‣ Two new optimizations for streaming, multi-application settings‣ Specialization‣ Model sharing

‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution

13

Page 14: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Exploiting stream locality by specialization

• Standard deep neural network recognizes 4000 people• Most of videos are dominated by less than 10 faces over minutes

14

Timeline

Produce more compact models for skewed classes

Page 15: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Specialization runtime

15

input(withskeweddistribution)

class

compactmodel

class

fullmodel

checkfor“other”

class

input

fullmodel

Page 16: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Better resource/accuracy trade-off

16

Page 17: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework
Page 18: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Outline

‣Detailed characterization of the approximation / resource trade-off for many optimizations

‣ Two new optimizations for streaming, multi-application settings

‣New scheduling problem, Approximate Model Scheduling, with a heuristic solution

18

Page 19: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Energy 14

Approximate model scheduling

19

Mobile device

Cloud

Model Pool

memory

energy

cost

Accuracy

Task 1

8 2 2model 1 (90%) model 2 (80%)

5 1 1

Task 2

9 3 3model 3 (80%) model 4 (70%)

6 2 2

Memory 10

Cost 14

Goal: maximize the overall accuracy

Page 20: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Approximate model scheduling

20

Mobile device

Cloud

Model Pool

memory

energy

cost

Accuracy

Requests:1. task 1 2. task 2 3. task 1 4. task 1 5. task 2

Task 1

8 2 2model 1 (90%) model 2 (80%)

5 1 1

Task 2

9 3 3model 3 (80%) model 4 (70%)

6 2 2

Energy 14 Memory 10

Cost 14 à cloud, model 1à device, model 4à cloud, model 1à cloud, model 2à device, model 4

model 4

model 1 model 1 m2

model 4

1 2 3 4 5

Packing problem

Page 21: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Approximate model scheduling

21

Mobile device

Cloud

Model Pool

Accuracy

Requests:1. task 1 à cloud, model 12. task 2 à device, model 43. task 1 à cloud, model 14. task 1 à cloud, model 25. task 2 à device, model 46. task 1

Energy 14 Memory 10

Cost 14

model 4

model 2

memory

energy

cost

Task 1

8 2 2model 1 (90%) model 2 (80%)

5 1 1

Task 2

9 3 3model 3 (80%) model 4 (70%)

6 2 2

model 4

model 1 model 1

1 2 3

Paging problem

m2

model 4

4 5

Page 22: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Approximate model scheduling

• Packing problem: pick versions that satisfy energy/cost budgets" 𝑒$𝑥$&

&≤ 𝐸," 𝑐$𝑥$&,

&≤ 𝐶(𝑥$&, 𝑥$&, ∈ 0,1 , 𝑥$& 3 𝑥$&, = 0)

• Paging problem: pick versions that fit in memory∀1 ≤ 𝑡 ≤ 𝑇," 𝑠$𝑥$&

:

$;<≤ 𝑆

• Goal: maximize the accuracymaxA" " 𝑎$(𝑥$& + 𝑥$&, )

$

&

No known optimal online algorithms22

Page 23: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Heuristic scheduler

• Estimate future resource use and compute the budget for each request

• Account for paging cost to reduce oscillations

• Use increasingly more accurate versions of more heavily used models

23

Page 24: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Trace-driven evaluation

24

Run out of battery

Lose connectivity

Page 25: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

development time

MCDNN frameworkcompiler

input typemodel schema

training/validationdata

cloudruntimeschedulerdatarouterprofiler

apps

input input

classesclasses

trainedmodelcatalog

deviceruntimeschedulerdatarouter

clouddevice

runtime

Page 26: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

development time

MCDNN frameworkcompiler

input typemodel schema

training/validationdata

cloudruntimeschedulerdatarouterprofiler

apps

input input

classesclasses

trainedmodelcatalog

deviceruntimeschedulerdatarouter

specializer

statsspecializedmodels

clouddevice

specializationtime

runtime

Page 27: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Conclusion

• MCDNN makes efficient trade-offs between resource use and accuracy• Formulate the approximate model scheduling problem and

devise a heuristic algorithm• Design a generic approximation-based execution framework

for continuous mobile vision

27

Thank you! Questions?

Page 28: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

BackupSlides

28

Page 29: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Cloud cost / accuracy trade-off

45 50 55 60 65 70 75 80 85

aFFuUaFy (%)

100

101

102

103

104

Oate

nFy

(P

s)2bj(COouG G38)

2bj(COouG C38)

6Fene(COouG G38)

6Fene(COouG C38)

)aFe(COouG G38)

)aFe(COouG C38)

)aFe(COouG G38)

)aFe(COouG C38)

cloud GPU latency budget (9.7ms)@ $10/yr, 1 request/s@ AWS g2.2xlarge

cloud GPU latency budget (582ms)@ $10/yr, 1 request/min@ AWS g2.2xlarge

cloud CPU latency budget (7645ms)@ $10/yr, 1 request/min@ AWS c4.large

29latency budget = cost budget / cost per hour / #requests

Page 30: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Model sharing

30

inputintermediate

values

face ID

race age gender …

routerinput

(a) (b)

model-fragment cache

Page 31: MCDNN An Approximation-Based Execution Framework for Deep ...research.cs.rutgers.edu/~uli/cs673/papers/MCDNN-MobiSys2016-Slides.pdf · MCDNN: An Approximation-Based Execution Framework

Dynamically-sized caching scheme

31