ignacio (nacho) cano · ignacio (nacho) cano ph.d. defense paul g. allen school of computer science...

Ignacio (Nacho) Cano

Ph.D. Defense

Paul G. Allen School of Computer Science & Engineering

University of Washington

December 2018

••

HDD TIER

SSD TIER

Virtualization Layer

Physical Layer

HYPERVISOR

VM VM VM VM

HYPERVISOR

VM VM VM VM

…

Virtualization Layer

Physical Layer

HYPERVISOR

VM VM VMVM

HYPERVISOR

VMVM

VMVM

…

West-US East-US

West-US East-US

…

Sub-optimal systems efficiency and performance

Improve systems efficiency and performance

Machine Learning to the rescue!

1. CURATOR: ML-based policy

2. ADARES: ML-based mechanism

3. PULPO: ML-System co-design

• Motivation• Challenges• CURATOR

• ADARES

• PULPO

• Conclusions

historical traces

transfer learning

sensing

reward

safe online exploration

uncertainty


• ADARES

• PULPO

• Conclusions

Scheduling of these tasks is key to overall cluster performance

• reinforcement learning

50% of clusters have 75% usage

Many clusters waste 25% of fast storage

Need smarter scheduling policies

AGENT

ENVIRONMENT

Action atState st Reward rt

st+1

rt+1

0

5

10

15

20

25

30

35

40

oltp oltp skewed oltp varying oltp and vdi dss

Impr

ovem

ent (

%)

latency ssd reads

METRIC AVG IMPROVEMENT

LATENCY 12 %

SSD READS 16 %

reinforcement learning


• ADARES

• PULPO

• Conclusions

Need a system that adaptively changes resources allocated to VMs in a cluster

• contextual bandits

• decoupled highly configurable

AGENT

ENVIRONMENT

Arm/Action atContext xt Reward rt

• Adaptive and improve over time

Arm/Action atContext xt

Reward rt

SOFTWARE AGENT

CLUSTER (ENVIRONMENT)

Sensing

cluster-node-vmmeasurements

Execution

Filtering

recommendation

Predictive

predict+

learn

Decision

exploitexplore

+business

logic

Less under and overprovisioning

Start End

Under and overprovisioning

Increases utilization Keeps up with IOPS

contextual bandits ••

• decoupled highly configurable•


• ADARES

• PULPO

• Conclusions

Need a system to efficiently train ML models from geo-distributed datasets

• Trade-off in-DC computation and communication X-DC communication

• Apache Hadoop YARN and Apache REEF

West-US Germany

Canada

DC Coordinator

More in-DC cmp. and comm.Less X-DC communication

• Reduce communication between data centers

• Multi-level communication trees across DCs

• Learning algorithms in terms of B/R primitives

Application Layer (DML/GDML)

• Generalized control plane

• Data aggregation, communication, etc. Apache REEF

• Federated version

• Single massive YARN cluster

• Network-aware resource requests

Apache Hadoop YARN

●

● ●

101

103

2 4 6 8Data centers

X−D

C T

rans

fer (

GB)

● centralizeddistributeddistributed−fadl

CRITEO 10M CRITEO 50M – 2DC

●

●

●

●

●●

10−5

10−3

10−1

100 101 102 103

X−DC Transfer (GB)

Rel

ative

obj

. fun

ctio

n ● centralizeddistributeddistributed−fadl

Better models soonerOrders of magnitude reduction

KAGGLE 500K KAGGLE 5M

10−7

10−5

10−3

10−1

0 2 4 6 8Relative Time

Rel

ative

obj

. fun

ctio

n centralized−bulkcentralized−streamdistributeddistributed−fadl

10−7

10−5

10−3

10−1

0 10 20 30Relative Time

Rel

ative

obj

. fun

ctio

n centralized−bulkcentralized−streamdistributeddistributed−fadl

Close to optimal in low dimensional models

Deteriorates with model size

in-DC comp. and comm. X-DC communication

Apache Hadoop YARN Apache REEF


• ADARES

• PULPO

• Conclusions

We can use Machine Learning to optimize Distributed Systems

• CURATOR• RL-based scheduling

• ADARES• Contextual bandits-based adjustments

• PULPO• Co-designed ML-System

Operations interposed at the hypervisor level

and redirected to CVMs

Global view of cluster state

Integrated Compute-Storage

Data replicationDisk balancingVM Migration

Global view of cluster state

Scale

KEY-VALUE STORE

I/O SERVICE(foreground)

STORAGE MGMT(background)

KEY-VALUE STORE

STORAGE MGMTuses MapReduce

I/O SERVICE

• extentsextent groups

eid egid

33 1984

44 1984

56 2010

Extent Id Map

egid physical location

1984 disk1,disk2

… …

Extent Group Id Map

file

33 44 56

0 1 2 3 4

Node A

Node B

Node C

Node D

Curator (A)Slave

Curator (C)Slave

Curator (D)Slave

Curator (B)Master

Metadata Ring

Curator (A)

Curator (C)

Curator (D) Curator (B)

120 -> mtime

120 -> atime

(120,mtime,atime)

Q(st, at) = (1� ↵)Q(st, at) + ↵(rt+1 + �maxaQ(st+1, a))

0 � 1

0 ↵ 1

Learned Value

Estimate of optimal future value

Learning Rate

Discount Factor

Old Value

LEARNED not to run in those cases

run although cluster highly utilized and low latency

Hypervisor

CVM

CPUMemory

SSD

SSD

HDD

HDD

SCSI Controller

Sensing

I/O Hypervisor

CVM

CPUMemory

SSD

SSD

HDD

HDD

SCSI Controller

Sensing

I/O Hypervisor

CVM

CPUMemory

SSD

SSD

HDD

HDD

SCSI Controller

Sensing

I/O

VM Management Software

Cluster Manager

Execution

Decision

Predictive

Filtering

w/o transfer learningmem noop

with transfer learning mem up

Low memory context

West-US

More computationLess communication

Initialize w0

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0

Send w0

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0

Compute Gradient

Compute Gradient

Compute Gradient

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0

2. DCs compute gradient in parallel

Send Gradient

Send Gradient

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0


Aggregate gradient

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0


3. Aggregate gradient

Send aggregate gradient

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0



Optimize

f̂1

Optimize

f̂3

Optimize

f̂2

DC-2 DC-3

DC-1 / Coordinator

no X-DC communication

1. Initialize w0



4. DCs local optimization in parallel

Send Descent Direction 3

Send Descent

Direction 2

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0




Aggregate descent

direction

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0




5. Aggregate descent direction

Send aggregate descent

direction

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0





Line Search

Line SearchLine Search

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0





6. DCs do line search in parallel

Send Line Search Results

Send Line Search Results

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0






Update model

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0






7. Update model with best step size

Send w1

DC-2 DC-3

DC-1 / Coordinator

1. Initialize w0






7. Update model with best step size

8. Repeat

Federated YARN cluster

Container

DC-1 DC-2 DC-3

DC Slaves

Global Master

X-DC linksX-DC linksDC Masters

ignacio (nacho) cano · ignacio (nacho) cano ph.d. defense paul g. allen school of computer science...

Documents